X
2005

Amazon to Sell Build-Your-Own Search Engine

December 11, 2005 0

Amazon.com in a move with potentially far-reaching implications for the search market has rolled out the Alexa Web Search Platform, opening up its huge web crawler to any programmer who wants paid access to its rich trove of internet data.

Nearly all the raw materials required to run a fairly complex Internet search service is now available, for lease, from online retailer Amazon Inc.

Amazon, which opens up 5 billion documents and 300 terabytes of data to anyone, along with offering computer and storage time for processing tasks. The idea is to enable the creation of new services that utilize Alexa’s vast Web archive and search technology.

 

In its simplest form, Amazon is providing storage space and server power to users at a price of $1 per CPU hour consumed, $1 per gig of storage used, $1 per 50 gigs of data processed, and $1 per gig of data uploaded. A user will have access to the equivalent of a 3.6GHz Linux server with 4GB of memory.

Alexa, best known for the web traffic statistics it provides for sites, originated as a search company founded in 1996 by Bruce Gilliat and Brewster Kahle. Amazon acquired Alexa in April 1999 and is now offering the technology as a web service to developers.

The Alexa Web Search Platform Beta will allow developers to “create new search services without having to invest millions of dollars in crawl, storage, processing, search, and server technology,” according to a statement by San Francisco-based Alexa Internet.

At the core of Amazon’s latest move is a rather novel idea for Internet search, Amazon believes. But it is based on an old business model whereby a company builds a product for other companies to buy or lease, develop further, then brand as their own.

Manufacturing and communications services have long been using this "white box" business model. The plusses are that businesses don’t need to spend as much developing their products, and the negatives include the homogeneity of products all based on the same technology.

The move benefits Amazon because it is both a new revenue stream and a channel to draw more users to its Web operations.

The Alexa platform offers a giant data store for users, currently boasting 12 terabytes of space. This storage area is separate from Alexa’s own Web data, which is comprised of three separate snapshots of the Internet. Each snapshot includes 100 terabytes of files and a new snapshot is rotated in every two months.

Amazon has put together an example photo search that takes advantage of the Alexa platform to seek out photos with very specific parameters such as camera model, exposure time and more. Such vertical searches are not currently possible on closed search platforms such as Google and Yahoo.

In other words, Alexa and Amazon are turning the index inside out, and offering it as a web service that anyone can mash up to their hearts content. Entrepreneurs can use Alexa’s crawl, Alexa’s processors, Alexa’s server farm….the whole nine yards, adds Battelle.

A debate has begun as to the response, if any, from Google Inc., Yahoo Inc., America Online and other major search engines, which only now make available only a tiny sliver of their secret algorithms for development purposes.

Special Crawls
The Alexa Web Search Platform will offer three online web snapshots of up to 100 terabytes each, along with tools for sifting through content so developers can create their own data sets.

While other search providers such as Google, Yahoo, and Microsoft’s MSN offer developers application programming interfaces and software developer kits that enable them to create customized applications, the Alexa offering will give them the ability to do specialized web crawls to search for content.

Rainer Typke, a web developer in the Netherlands, has already used the service to help people search for music by whistling a melody on his web site, musipedia.org, according to reports in The Wall Street Journal.

Developers will be able to upload, compile, and run their own programs on a processing cluster, store their output on a storage cluster, integrate their data into a search index, and access their search through Seattle-based Amazon’s web services.

Because the Alexa Web Search Platform is currently in beta, only a limited number of accounts will be available. Signing up is free, and accounts will be billed once a month for the services used. Amazon account managers will aid in the process of getting users setup and answer any questions.

A FAQ and user guide are available detailing how ASWP can be used by developers.