Matt Cutts #9: All about datacenters
Here’s the ninth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s first!
Transcription
OK! This is Matt Cutts, coming to you live from the Mattplex. Its Monday, July 31st. And I am wearing a different shirt, so its not all one big take. In fact, it my werewolf versus unicorn shirt. That’s right, you’ve got the unicorn and the werewolves. Mortal enemies since the beginning of time and “Its On Now”.
Alright! So, this should better be a special session. Lets take a fun question from g1smd. They ask,
“For all the datacenter watchers out there. Should all results across one class C IP address block be the same most of the time, except when you are pushing data or they are supposed to be different because you are trying different things on them? And, would make more sense to use the direct ip addresses when reporting issues or problems, or the 41gfe datacenter names?”
Alright. Well! Lets talk about datacenters.
Back in the days of dinosaurs, you know, when the dinosaurs roamed the earth, you could actually run a search engine off of one computer. And those days are long since gone unless you have a really, really, powerful compute,r or something very, very, small to search over, or you have Google Search appliance, I guess. So, these days you pretty much have to have a datacenter. And in the early days of datacenter you could just do, you know, some sort of round robin trick with dns, so that you always hit different datacenters. Google does some very smart stuff in load balancing, some very interesting techniques to try to make sure that different datacenters are able perform well.
So your basic question was this. Should all things on the same Class C IP block be roughly the same. And yes, they should roughly be the same in that they are typically the same datacenter. But not always. Let me give you a couple of examples.
If one datacenter has to fail over or if one datacenter is out of rotation, then even if you are going to one IP address, you can get bounced over to a different datacenter. And even though it will look like you are consistently hitting the same datacenter, behind the scenes, underneath Google’s load balancing, you could be hitting a different datacenter completely. So, those situations are somewhat rare but not that rare. So that’s why sometimes when you see people having debates online at WebmasterWorld or Datacenterwatcher and stuff like that, they can actually be seeing different things, even if they hit the same IP address.
The other point I wanted to make, and I made this at Pubcon, Boston, was that, the datacenters often have a lot of different things going on. So whenever there is a new algorithm update or some other feature that we are trying out, we often try it out on one datacenter first, to make sure the quality is what we have expected it to be based on evaluation, stuff like that.
So the datacenters do differ, you know, according to very some complex intricate plans, so that we can try out different things at different datacenters. Typically, on one class C IP address, you will usually hit the same datacenter, but that’s not guaranteed. Also, at Pubcon Boston, I showed a list of, an example of the sorts of different things that are going on at different datacenters. It sort of shows how things a lot more intricate now than they use to be and so, Google does a lot more smart scheduling and its a lot harder for a random person to just look at a datacenter and reverse engineer or try to guess you know, which way things are going, stuff like that.
As far as IP address versus the GFE name, which I think exactly me and g1smd know about, no one else really bothered to talk about, except may be on WebmasterWorld, you can use either IP address, or you know the two letter code of a datacenter, because we are able to map them both back. If you tell us one, we can tell what the other one is, ether way.
In general though, there are probably better ways to spend your time, than watching datacenters. I think its a good use of your time to work on your content, a good use of your time whenever something major is going on if you really want to look whenever there is a pagerank update or something going on. But, in general, there is enough stuff going on at different data centers, that I would say it’s probably not worth checking every single datacenter, every single day to try to figure out, ‘OK, how am I going to do or how have I been doing’. Its probably better to spend a little more time paying attention to your logs and work backwards based out of that.
Transcription thanks to Peter T. Davis