What is cloud?

I want to talk about cloud. Everyone seems to have heard about it, but not everyone understand what it means. In my opinion, the essence of cloud is resource aggregation, and the two pillars making it possible are the ability to coordinate large-scale resources, and the division of resource’s ownership and usage.

Resource aggregation: the WHAT

Oversimplifying it, cloud is resource aggregation. AWS, Azure and Google Cloud are all estimated to have hundreds of thousands, millions or even more servers running, branded and accessed as one entity.

This astronomical scale gives them the ability to solve big problems that no regular data center can: to support a SQL database service with 100 queries per second, any traditional hosting will do; to support one with millions of queries per second, it is necessary to leave it to a big cloud.

Another example is big data. The reason that people usually hear the words “big data” and “cloud” together is because cloud, with its aggregated computing resources, is one of the most promising way to solve big data problems. Big data problems are big, mining in the haystack of billions of data points for some needles of insights. That scale of problem simply cannot be solved without, say, 10,000 computers. This is where cloud comes in.

Division of ownership and usage: the HOW, part 1

Separating resource ownership and usage is the key to make cloud possible. Why? Because there are too many users with different computing needs.

Imagine there is a law that forces everyone who needs computing to do it on a device he owns. User could run into all kinds of trouble in that world: when his computer is off, it wastes CPU resource; when he has a huge project to run, his computer seems too slow but it would be crazy to buy a faster one, because this type of task occurs only once every 6 months for him; when his computer gets a virus, he cannot work; and when he travels, he has to bring his big desktop computer with him because his work cannot be processed on a smartphone CPU.

But when users become subscribers instead of owners of resources, everything seems peachy: all a user needs is a small device to login to his cloud account; there, he can ask the cloud to do the computing anytime and send the result back to his device; the task can be big or small – the cloud will deal with it and charge the user for CPU time; the cloud never idles.

Consider OneDrive (or Google Drive, iCloud, what have you). When you store (note that storage is a type of computing service!) a file in OneDrive rather than your local drive, you have access to that file as long as you have a Microsoft account. You can even sell all your computers and phones, go on a world trip for a year, coming back, still able to access to that file – because you didn’t have to be the owner of the storage device!

Coordinating the resources: the HOW, part 2

The ability to coordinate resources empowers clouds to do more with aggregated resources.

The coordination is usually done with a software. As a matter of fact, an often-seen definition of “cloud” is “the software that runs the data center”, for example, AWS, Azure, OpenStack, WAP. These software are masters in performance optimization, divide-and-conquer strategy. They know how to make the best use of each computing part in their pools, and they provide failover mechanism – in case a cluster fails, computing can be carried over to another one with little latency added.

Without resource coordination, a data center with 1000 computers is no different than the collection of 1000 random computers in town. The coordination is what maximizes the performance and minimizes the cost through economy of scale.

For example, imagine there are 1000 people who wants to keep a copy of Photoshop CS6 installation package. Say that package is 1GB big. Without cloud (OneDrive, iCould…), these 1000 people each needs a computer with at least 1GB of hard drive space left. But with cloud, all it takes is 1GB of disk space in one of the servers in data center, thanks to resource coordination.

Here is how the software would handle these 1000 requests: When person no.1 uploads this file to cloud, cloud stores it, marks it uniquely (with MD5 hash, for example), and marks person no.1 as owner. When person no.2 uploads this file, the cloud detects that said file already exists, and simply mark person no.2 as an additional owner of this file. This goes on for all other people. Even if at some point person no.1 deletes this file from his cloud account, this file remains in the hard drive of the server for other owners. The cloud ends up storing only one 1GB file, with a list of its owners.

The benefits of resource coordination includes not only better performance and lower cost, but also robustness, environmental benefits, etc.

Summary

Cloud is a way to solve problems by aggregating resources, based on the idea of separating resource ownership and usage, and the technology to coordinate such resources. Said technology is often referred to as “cloud” itself.