Moving data to the cloud: it's a batch

Information Management magazine published an interesting article dealing with the implications of moving data to the cloud, but rather than following the usual themes of security and reliability, it looked at the physical process of getting the data there in the first place.

Following on from an earlier blog post by Werner Vogels, Chief Technology Officer of Amazon, it was noted that transferring 1TB of data to the cloud would take between 13 and 82 days when using T1 or 10Mbps connection, even taking two days with a 100Mbps link. Considering a large business would have significantly more than 1TB of data, this means that it could be concluded that "using a public cloud is not as practical as initially predicted".

The issue is not aided by the fact that the majority of businesses will have a mixture of relational, non-relational and packaged application data sources, with the variety of data sources "typically reaching into the hundreds", and with new ones added frequently. With the need to preserve business continuity while a migration takes place, IT departments are left with a significant challenge -- and another possible barrier to cloud computing adoption.

According to the report, there are two methods for migrating data to the cloud. The first is to "batch load", and to ascertain if this is the best choice, it was suggested that the IT organisation should determine if the data stored in the cloud will be shared among many applications or if it will be "compartmentalised" for use by individual solutions. If multiple applications use the same data sets predominantly for read-only purposes, then sharing these data sets is likely to be "safe". But if an enterprise copies data into multiple locations in order to increase local performance or enable combination with other sources, it may be better to adopt a consolidated model that can be ported to the cloud.

Once the data to be migrated has been identified, the IT department can create scripts to enable this to take place, which will take "several hours to several days" depending on the volume of information to be sent. In addition, it should be noted that changes are likely to be made to data while the transfer is taking place, meaning that it will be necessary to ensure modified data is replicated before moving fully to the cloud environment.

An alternative is data virtualisation, which fully abstracts data from the sources and accessing applications. The data model put in place for the data virtualisation layer can also serve as the initial data cloud model for the particular sets to be abstracted, and instead of batch-loading, data can be migrated to the cloud on-demand. This also enables a phased migration with some enterprise apps continuing to access data through the virtualisation layer and others accessing data in the cloud, with changes made in the cloud automatically synchronised back with the originating data sources using the virtualisation middleware's pass-through capabilities.

Data virtualisation also provides possibilities to ease the process of migrating enterprise applications which were not originally designed for cloud use to the cloud, by providing a gateway between the applications and data using standardised interfaces.

Blogger Anonymous