Google, last August initiated testing Caffeine, and it was suggested that Caffeine would go live “after the holidays.” This new engineering platform was introduced a month after Microsoft upped the ante in the search war by extending its new Bing search engine to all of Yahoo’s Web properties. The Yahoo Web portal is the number two player in search.
Google’s old search index contained various layers, and under the old method, Google would crawl the entire Web to update large batches of Web pages in its index, and some of which were revised at a faster rate than others–the main layer updated every couple of weeks. In essence, this meant that content was not appended to the Google search index until a layer was revised, so there was a significant delay between publishing content and having it displayed up on the search engine.
“Caffeine delivers 50 percent fresher and faster results for Web searches than our last index, and it is the largest collection of Web content we have offered,” the company says in a news release on its official blog. “Whether it is a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.”
To better comprehend how Caffeine works, a person must first know that Google does not search the entire Web to answer user queries, but merely its index of the Web. The quality of results depends on how well a search engine can keep its index up-to-date.
Here is a little search engine 101: When you perform a search you are not actually searching the live web. Instead, you are searching an index page of the web. It is just sort of a reference page at the back of a book that directs you to the specific information you are looking for.
Google software engineer Carrie Grimes said Google is launching Caffeine to keep up with the development of the web and to meet rising user expectations. In a nutshell, with Caffeine, Google simply crawls the Web in smaller portions and updates its index on a continuous basis.
“Content on the web is flourishing. It is growing not just in size and numbers, but with the advent of video, images, news and real-time updates, the average web page is richer and more complex,” she said. “Moreover, people’s expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.”
“As we discover new pages, or new any information on existing pages, we can add these straight to the index,” Grimes said in the company’s blog.
Caffeine examines hundreds of thousands of Web pages each second in parallel and adds new information to the index at a rate of hundreds of thousands of gigabytes per day, according to Google. Caffeine takes up nearly 100 million GB of storage in one database.
“You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles,” Grimes said.
Here’s a promotional video from Google that explains how the search works.
“On the competing frontier, currently there are only two search indexes now: Google and Bing. Bing’s index continues to improve, especially in the ‘long tail,’ where Google has been dominant,” said Greg Sterling, principal analyst at Sterling Market Intelligence. “Bing will also be competing with interfaces and feature upgrades; Google will be competing with ‘freshness.’ Yahoo says it will use the Bing index and innovate on top of it.”
Image Gallery: View Top 16 Google Services Here. (Credit:InformationWeek)