Sunnyvale, California — The struggling internet pioneer Yahoo, attempting to regain its rank has launched a project which the company hopes will provide a clearer picture of how email traffic travels on its networks. The program dubbed as “Mail Visualization Project” would provide users with an interactive world map view of where traffic is heading and how spam messages are being intercepted and blocked.
Interestingly, using a real-time sifting tool based on the Hadoop platform, the visualization tool is able to provide a live report on which regions are currently accounting for the highest volume of network usage. The site also includes graphs and allows visitors drill down on map regions to receive more data about Yahoo Mail usage in specific parts of the world.
“Through the visualization project, we can illustrate the technology behind the Yahoo Network and show the impact that the tech has on you, the consumers of Yahoo’s great technology,” wrote Yahoo Labs engineer Markus Weimer and grid architect Andreas Neumann in a blog post outlining the project.
In addition, people can refine the data to see only the volume of legitimate messages or of blocked spam messages. Another feature indicate the most popular subject line keywords.
However, on average, Yahoo Mail handles approximately 70,000 messages per second, or about 6 billion per day that is being dispatched from its 300 million users around the globe, said David McDowell, senior director of product management for Yahoo Mail.
“Our technology analyses all this unspecified data with the help of Hadoop and identifies spam patterns so we can then use really intelligent algorithms to predict future email patterns that will differentiate ‘good’ and ‘bad’ senders.”
Moreover, according to recent information that indicates that spam traffic has begun to tail off slightly as a result of law enforcement project and botnet shutdowns, but Yahoo reports that it still intercepts four out of every five messages on its mail service as spam messages, thanks in large part to the company’s adoption of Hadoop, the open source software framework for applications that manage massive amounts of data, which the company calls the “brain” behind Yahoo Mail.
“This site is about visualizing the amount of data we are handling to protect our users from spammers and phishers,” McDowell said.
Furthermore, all the data available on the site is in aggregate, anonymized form. None of it can be tracked back to individual users. Third parties have no access to the site’s data. The data on the site is delayed by about 1 hour, so it is not exactly displayed in real time.
Hadoop has seen its popularity soaring in recent years and many vendors and enterprises have been considering to integrate the platform for large-scale database and big data collection and analysis. Recently, Oracle made the Hadoop platform a key component in its much-publicised Big Data Appliance rollout.
The company mentioned that by adopting the Hadoop platform, it is able to reduce spam reports by as much as 65 per cent.