San Francisco — Are your Twitter searches humming a bit faster these days? The micro-blogging site, Twitter in an attempt to deliver users faster and more comprehensive search functionality, said Wednesday that it has changed its entire search architecture, and nobody noticed for weeks until they pointed it out.
While the interesting thing is that the casual user may not necessarily notice the difference, the backend update will enable the micro-blogging website to better handle an estimated 1 billion queries that it receives each day.
The transition is now completed and Twitter says that the new backend performs a lot better, is significantly faster and comes with a few enhancements for the users as well.
Until this upgrade, Twitter’s real-time search engine was powered by software developed by Summize, a company that Twitter acquired in 2008. However, Twitter now handles 1,000 tweets per second and is no longer capable of managing the 12,000 queries that the micro-blogging site receives every second, resulting in 1 billion queries per day — thus the technology from 2008 was not really effective.
Twitter has been working on its back-end search for about six months, and implemented the changes in the last few weeks. “If we have done a good job then most of you should not have noticed that we launched a new backend for search on twitter.com during the last few weeks!,” Twitter’s Michael Busch, one of the people involved in the project, wrote in a blog post.
“One of our main objective, besides the biggest challenges, was a smooth transition from the old architecture to the new one, without any downtime or inconsistencies in search results,” he explained.
About 6 months ago, we decided to formulate a new, modern search architecture that is based on a highly efficient inverted index instead of a relational database. In a blog post, officials with the San Francisco startup mentioned that the revamped search architecture is now based on Lucene, a search engine library written in Java. However, due to several of Lucene’s “shortcomings” for real-time search, Twitter developers rewrote significant portions of the core in-memory data structures.
Our requirements on the new system are enormous: With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines. As we want the new system to last for several years, the goal was to support at least an order of magnitude more load. These modifications include improved garbage collection performance, efficient early query termination, traversable posting lists as well as lock-free data structures and algorithms, according to the blog.
“And, before you ask, we are planning on contributing all these changes back to Lucene; some of which have already made it into Lucene’s trunk and its new realtime branch,” Twitter said.
Twitter asserts that the new architecture will not only benefit the company, but users too. Users will notice a bigger index, which they say is twice as long, without making searches any slower. Interestingly, Twitter also says the new tech will allow them to develop some “cool new features” faster and better.