In Class Case – Google Datacenters
For this case study I would like you to review the Wired article about Google’s datacenters and share some of your thoughts about it. The article can be found here
- Note some of the interesting ways Google runs its datacenters.
- Google’s systems can be thought of as the ultimate server cluster. What benefits do they gain from that model, are there downsides?
- Google has a special team called the “Site Reliability Engineering team”. Who are they and what do they do that is special for Google? Do smaller enterprises such as WSU or Sinclair have these sorts of assets?
Google is the largest information technology companies in the world focusing primarily on provision of information and data to the users. At the heart of Google’s operations are data centers, often referred to as the “floor”, a large and complex group of networked computer servers used by organizations and corporations for the remote storage, processing and distribution of large amounts of data. With the recent huge advancement in IT market and the entry of many powerful competitors, Google’s primary competitive advantage is its huge network system. As such, the security and secrecy of the Google’s network system is of ultimate importance to the daily operations of Google and for the users. As such, only critical employees are allowed to peek into the network system or the data centers. To reinforce security of the data centers, biometric authentication is used to open or close the doors to the “floor”.
For effective and continuous operations, Google has a team of Gudanets whose function is to populate the data centers with computers and ensure their smooth running at all time. Moreover, Google has ensured the distribution of its servers in different locations around the world with the primary objective of ensuring fault tolerance. In order to ensure a sustainable and cheap delivery of content to the users, Google developed its own data centers, managed and owned by the company. Though expensive to do so, the long-term advantages and benefits were worth it. The most important observation from the manner Google develops and manages the data centers is on the way it ensures sustainability. The company sought to reduce the amount of electrical power used in cooling the computer servers by introducing water-filled coils that would absorb the heat from the servers thus directly reducing the amount of electrical energy required by fans and other cooling systems.
A server cluster may be described as a collection of two or more computer servers that are connected to each other through fast networks and which work together to realize high availability, reliability and scalability. Google’s datacenter servers are connected together in the form of ultimate server clusters thus offering numerous advantages. The first benefit is fault tolerance. In the case of failure of one of the servers in the system, another servers takes over the failed server thus ensuring continuity of service and high availability. Secondly, it allows and enhances scalability of the system. New servers and network devices can be added or removed from the cluster accordingly without causing errors and faults. Finally, the ultimate cluster server model offers for high maintainability of the entire system. If dedicated servers require maintenance, they can be conveniently stopped while other servers handles their load. The main downside of the model is that it requires huge amounts of capital to implement and maintain.
Servicing hundreds of millions of users, continuous and reliable running of the Google’s datacenter is not an option rather a responsibility that Google must ensure. In order to ensure reliable and continuous running of the servers, google established the site reliability engineering team (SRET). This is a team comprised of software engineers whose function is to design and develop ultra-scalable and reliable software systems for the management of the datacenters and the servers there in. The functions of google is to ensure that Google and its services running throughout. Although some smaller enterprises do not have such assets, they should consider having them as a means of ensuring continuous operations.