While most Google updates come in the form of changes to the search engine’s ranking algorithm (e.g. the Penguin Update), the Caffeine Update of June 2010 is a different story. Like a strong espresso on a Monday morning, the Caffeine Update worked to freshen up search results by changing the way Google collects and organizes information from across the web. While certainly a big move on Google’s part, this update was a little more behind the scenes compared to the more tangible algorithm updates we all love to hate. The idea was not to have an immediate impact on the user’s search experience, but rather to prepare for the rapid and constant growth of information held on the web. It was a change to the how instead of the what.  As an infrastructure update, Caffeine quietly overhauled Google’s very foundations and replaced them with something more versatile, more efficient, more, well… refreshing!

Curating the world wide web

Understanding how the Caffeine update works requires a bit of background on how Google works more generally (if this is all pretty obvious to you, just skip over to the next section). The world wide web is an enormous place, consisting of an unthinkable number of websites and their individual pages. To make things more manageable for the average web user, Google sifts through all these pages, stores them, organises them and then very kindly displays them to us when we search for particular terms and queries. The initial process of reading through web pages is called ‘crawling’, while the process of collating and listing them afterwards is called ‘indexing’. Make sense so far? Good!

Since Google’s index contains every single web page that you can reach through their search engine, it is pretty much as big as you’d expect. In fact, at the time of writing, it’s size is estimated at a whopping 46 billion pages! With little Google bots constantly whizzing around the web looking for new pages, that number is quickly growing as the web continues to expand. Google’s indexing infrastructure is integral to this ongoing process of growth and adaptation, which is why the Caffeine Update was so necessary at the time. Google clearly felt that the infrastructure needed updating in order to keep up.

Out with the old…

By now we’re all pretty used to the concept of ‘information overload’, but in 2010 things were just picking up momentum compared to now! More people were using the internet, more information was being distributed and the demand for fresher and more relevant content was higher than ever before. Back when Google were using an infrastructure that was designed for a much smaller internet, the web was a lot more manageable and real-time indexing wasn’t as much of a pressing matter. With this old indexing system, Google bots crawled web pages in large sets (or batches), as such a crawled page wouldn’t be published until the entire set had also been crawled and processed. As such, pages spent a long time ‘on hold’ before being made available to users.

This wasn’t so much an issue for existing pages that were already indexed and available on Google, but it was problematic for new pages containing fresh and time-sensitive content (e.g. breaking news stories) which needed to be indexed a lot faster. Google was quickly becoming a go-to source for fresh information, and their infrastructure at that point lacked the speed and capacity to quickly process large swathes of new content. Carrie Grimes of Google said it best: “searchers want to find the latest relevant content and publishers expect to be found the instant they publish.” In other words, things had to change!

In with the new!

It’s important that Google’s indexing system works efficiently, partly so that their search results are broad, relevant and fresh, and that’s why we got the Caffeine Update. Put simply, Caffeine made the indexing process a lot quicker. With the new infrastructure, pages are individually pushed through the crawling / processing / indexing funnel and published live on results pages almost immediately. Unlike the old batch processing system, Caffeine’s ‘incremental indexing’ method was able to complement the fast-paced nature of web content and information flow. Ultimately, therefore, Caffeine was built for the same purpose as every other Google update: to help us find the exact results we want at a much faster rate.