A brief introduction to “The Cloud”
To many, “The Cloud” is just another buzzword for “datacenter”. For the most part that is correct, however the word “Cloud” implies a more specific architecture of how the individual machines in the datacenter relate to one another.
Clusters, load balancing and fail-over
Traditional datacenters would already guarantee an uptime of 99.9X%. This can only be accomplished by clustering servers together in order to switch the active machine when the other is being rebooted for OS updates or hardware improvements/replacements. Fail-over and load-balancing features have been introduced with NT 4.0 EE in 1996.
Virtualization
Virtualization introduced similar features that overlap the features in clustering. For example, one would be able to transport a running virtual machine to different hardware without having a glitch of downtime. This indeed satisfies the hardware related needs, but does not solve reboots needed for software updates. It is imaginable to clone a virtual machine, update and reboot it and then switch IP addresses manually, but the hassle and risk of having the two machines conflict during the transition would make it anything but seamless. Also, with clusters, after a switch of nodes has taken place the previously active node can still finish handling all requests and running processes it had started on before the switch took place.
Virtualization has complemented clusters but not replaced them. A similar concept is that a raid array does not eliminate the need of a backup. If a disaster strikes (a virus starts deleting files) then all disks in the array are affected, so the need of backups has not been annulled by raid-array’s.
Cloud computing
The Azure operating system is a hybrid between clustering and virtualization. In addition it partly abstracts the Operating System away from several services that would typically run on clusters, for example IIS and BizTalk server. To which extent the OS of the virtual machines have been eliminated for these services is not clear. It is clear however that there is less overhead when running instances of these services in Azure in contrast to running them inside clustered virtual machines.
Amazon have done something similar with their Elastic Beanstalk service. I suspect they are using Azure as part of the implementation, but that may be totally wrong. Fact is that they also have abstracted part of the OS away from the IIS instances. It is possible to dynamically create a virtual machine around them in order to debug, but Amazon explicitly warns that any files or configuration made to anything else than the IIS service will not be preserved and will be lost after disconnecting from the virtual machine.
Scalability
One of the most marketed arguments for cloud computing is “scalability”. The ability to start small and add more hardware power to the service, the moment it is needed.
Other aspects and virtual machines
Apart from abstracting services from the operating system, both Azure as Amazon provide virtual machines, storage, database, redundant-distribution and content delivery features.
A big difference between an Azure Virtual machine and an Amazon EC2 instance is that those on Amazon will persist any data stored on the VHD by default. On Azure, one is supposed create a fixed image for the system, and store any data that should be persisted on a different “storage” drive. This idea simplifies the task of shifting resources when virtual machines are clustered. One can be simply shut down and start elsewhere. This behavior has led to lots of confusion and complaints about lost data.