Sunday, August 15, 2010

Warehouse Scale Computing

This is a book about google datacenter. It talks about how efficiently google manages the mass of data and transactions with complicated algorithms.
Not only the process efficiency, google also makes ahead of power (electricity) usage efficiency. Power Usage Efficiency (PUE) is something that datacenter can call as a "standard" of which to tell whether the datacenter is "eco-friendly" facility or not.

I am a graduate school student, who is fascinated about studying how computers can contribute our better lives. As for google, their search engine is one of the computers with the highest transactions per day, week, month, and so forth... They are one of the biggest computers in this planet. To know how that such a computer is made of, consisted of, is really fascinating for me. Even though I do not fully understand...(heck, if I could understand it all, I would be working inside Google then, hehe), what this book can give us is really important and insightful.

So, what is Warehouse Scale Computing?
If it only talks about the "scale", then it would be referred as "data center", as it used to be. Data center is a building where the servers and storages allocated in one place with similar system and security requirements. In that sense, WSC(Warehouse Scale Computing) is a sibling of data center. However, conventional "data center" mainly deals middle-small scale applications. Each application is separated, and protected, and acts upon the allocated hardwares for the applications. In such traditional data center environment, multiple organizations/departments individually. There is not much idea of commonality among hardwares and softwares. Each component merely communicates each other or does synchronizes.

Today, there are many WSCs up and running on the service of companies like Google, Amazon, Yahoo, and Microsoft. Each of company has established standard to run the system.
Yet what is shared among those companies on WSC is that, the system is owned by its special organization(task force) and uses rather homogenous hardware/software platforms. In conventional data center, it is very common to use hardware/software from 3rd party vendor. WSCs are inclined to use their own application, own middleware, and own system architecture to run the transactions. In many cases, it is not an easy task. However, it is mush more relevant in the context of today's demand to computing; cost reduction, process efficiently, and green IT.

IT system is expected to be up and running for 99.99% properly. 99.99% - which only allows 1 hour of service down in an year). While most of data center can barely keep up this system requirement, WSCs are literally in "next level".



Cost Effectiveness:
Here are some of the hurdles for data center. They are the minimum requirements.
  1. Increased number of service/users will result the heavy transaction query. 
  2. The size of problem (if one has to happen) expands like a snow ball. Everyday, the websites are increasing by millions and so do the number of indexes. The growing nature of web and business demands of cost reductions. 
  3. If the system can bare and stabled in the same through put and data volumes, the nature of web indexing (search engine/user experience) constantly and continuously tries to improve. Needless to say, it also adds the transactional workload to the data center. Of course, there are some minor improvements which will not harm the transactional volume (e.g, improvement on search algorithms, index optimization and etc). 
Data Center; One vs Many:
Usually, data center in single physical location is considered as "one" data center and as a one "computer". In WSC, the idea is to count multiple data centers as one computer. By doing so, the system can reduce the redundancy of data replications, latency, and overall transactional efficiency.

Why you should know about WSC...
You may think, "this is such an high-end technology that only concerns IT companies with gigantic database". Well, that is not true. With current technological growth in storage/server business (as well as processors), these facilities will be available in very reasonable prices for ordinary users in few years.
Just like you study IFRS if you are an accountant, these stuff will be de-facto standard, so it will be very helpful.