How does Google do it?

by Admin


24 Jun
 None    Search Engines


by Rob Sullivan


by Rob Sullivan
http://www.enquiro.com

I read an interesting article the other day in which they interviewed one of the chief IT people at Google and a point was made which I hadn't really considered. They were talking about managing the cluster of servers that is Google, and how much work it must be. This IT person went on to say that they had developed their own custom patch management system in house to manage the changes required to their software.

That got me to thinking. Estimates have put the Google cluster at anywhere between 10,000 and 80,000 servers, working in tandem, to ultimately produce that search box you see when you go to Google.com.
So even if we assume that there are 10,000 servers spread across multiple data centers, management of these servers must be a huge task. Just take a minute to think about that - 10,000 servers spread across 13 data centers many thousands of miles apart.

And not just ensuring that the operating system is up to date and secure, but imagine applying algorithm changes to these servers? Do you think their IT department spends their time at a management console manually applying algo changes? I don't think so.

That got me to thinking that this custom patch management system is more than just a way to keep a huge number of servers patched, it probably also doubles as the framework for applying changes in general to the system including algorithm changes, as well as new products and improvements.

Even at that... it is still an impressive system.

For those who don't know how this could work, let me outline it for you. If you are familiar with the Windows Update system this should be fairly simple.

Essentially, with Linux, you can set up an automated query to a central repository and look for new updates. Then the update program can download and even install them with a little user input.

But to have Google's entire server base query Redhat, or FreshRPMs or someone like that wouldn't be very efficient and would likely bring the servers, which host these repositories, down.

So I am certain that they have built a system whereby one system knows the configurations of the servers and that one system requests the updates, then pushes the updates onto the server clusters.

This too would be how they apply algorithm changes: A change is made, and then pushed to the distribution server which then pushes the changes to the clusters.

This would also account for what used to be called the "Google Dance" where different servers would update at different times of the month for about a week.

If they do, in fact, have this central repository of their own, and all the clusters request updates from it, they don''t want to overload the repository with 10,000 simultaneous requests. Therefore the updates are staged on some sort of schedule. And since algorithm changes are also likely pushed out on a similar schedule that could account for the dance.

Of course now the dance is obsolete, more or less, because of the constantly updating index, but the theory of patch application and management stands. Which means, when we start to see significant changes in one index we should start seeing the same changes across all data centers eventually.

The reason I started thinking about all of this was because Google intends to open up a research center in Australia and at around the same time they plan to release some of their code to the open source world.

But the code I think they are going to release won't be anything proprietary, like algorithms, but I do think this patch management system (or parts of it) will be what they release. After all, any other large network running Unix/Linux could benefit greatly from this type of system.

Of course I'm sure I've oversimplified this system greatly. If there is such a system that Google is using it must also account for many hundreds of client workstations which likely have multiple configurations depending on users and preferences which adds to the complexity of such a system. But then again, Google has dozens of programmers with PhD's so I'm sure it isn't an insurmountable problem for them.

Rob Sullivan
Production Manager
Enquiro.com


Copyright 2004 - Searchengineposition Inc.


News Categories

Ads

Ads

Subscribe

RSS Atom