Ever since the Florida Update SEO and SEM firms have been speculating on what Google did and why they did it. We developed our own theory in which we believe Google used some of the Applied Semantics technology to build one of the world's largest ontological databases. In other words Google has been attempting to infer meaning on pages, and returning results matching or similar to that meaning.
While others dispute this and say the results are because of other algorithms, like the 'Hilltop' algorithm (which has many of the same parts of the ontological algorithms built into the Applied Semantics package) none dispute the fact that November 2003 was a turning point for Google.
It was also shortly after this happened that many began to realize that PageRank was playing less of a role in the overall rankings. Who would have thunk that Google was now saying "regardless of the PageRank sites that more closely match the query would be returned in the results?"
This led many to speculate that PageRank was dead and that we no longer need to focus on PageRank as it has little to no effect on final rankings. However, I'm here today to tell you that this isn't entirely true. And the reason I think this has to do with how Google, the physical system, works.
If you don't already know, Google is comprised of clusters of servers which store and serve the index. Other clusters filter and sort and present the results, but it is this storage aspect I am going to focus on now.
In order to maintain its integrity Google splits the index into 64 Megabyte sections. Then these sections are replicated across multiple clusters. This maintains the integrity because many clusters can fail and the index will still render correctly.
But this isn't the focus of this article. What I want to focus on is just one of those clusters.
If the index is split into 64 Mb sections, then each cluster will obviously store multiple, disjointed sections. So while a cluster may not have one entire index from beginning to end it may have a percentage. It is on these individual clusters where the initial sorting happens.
When a request is made (via the search box on Google) the clusters receive the request and sorts through what they have stored and return the top results. And wouldn't you know it? One of the most important factors the clusters use at this level is PageRank.
In other words, when the cluster receives a request, it determines if a result matches and then orders matches in PageRank order. Of course I am over simplifying it here, but essentially this is what happens.
So let's say that in the 64 Mb section where your site is stored, there are also 5 of your competitors. The cluster receives a request which says "send the top 3 sites matching this query." You can imagine what happens next, there are 6 sites matching the query, but only 3 can be sent. What is the cluster to do?
You guessed it, order by PageRank and send the top 3 sites.
So what if your PageRank is 4 but three of your competitors are 5 and higher? Chances are that your site won't be sent for this request or any others which match your site as well as theirs.
This is why I think PageRank is still important, because obviously if you don't pass this initial screening process you won't show up in any results.
Of course we already know that having a high PageRank doesn't guarantee you a top spot in the final search results, this is where the upstream process happen, applying the semantic/hilltop algorithms and so on - before the results are served to the end user.
You see, I think PageRank obviously still has some weight, but it's applied earlier in the ranking process. However it can be an effective filter. This way the upstream algorithms don't have to process as many sites. This reduces load times and helps improve the speed in which search results are returned. Because rather than returning 1 million results, the clusters combined may only return 750,000. That means there are 250,000 which do exist in the index, but which the upstream algorithms don't need to be applied to.
What else this tells me is that as the index grows, higher PageRank values may become more important, because there will be more sites competing for that top spot, so they can move onto the next round of filtering.
Let's go back to that original example - a request is now made for four sites in your cluster. If the same three competitors have a h