X
2009

Google, Microsoft, Yahoo! On “Canonical” URL: A New Approach To tackle Duplicate Content

February 14, 2009 0

San Francisco — It is so unusual to witness the unity among the top three search titans in a given corporate field. Google, Microsoft, and Yahoo have joined together to extend support to a technique by which a little extra code in a Web page can indicate the address of its “canonical” version–essentially, the original, primary URL, an unofficial standard for steering search engines in the right direction in the wake of eliminating content duplication.

The move is to make it effortless to indicate search engines what they should pay attention to and to avoid treating duplicative Web pages as different.

Google, whose major goal is to organize the world’s information, a task that can be extremely difficult if search engines are faced with duplicate content. This is an issue that most of the websites have to deal with, particularly e-commerce sites. One of the most common causes occurs when a website offers different routes in order to access the same content, and for each route presents a different URL.

Today, the search engine bots that crawl the Web for pages to index do not have any particular way to know whether they should be pointing to a “http://www.somepage.com/index.html” or “http://www.somepage.com/index.html?lang=en”–the latter with an optional extra tidbit at the end that suggests the Web server should display the English-language version of a page.

This example here shows the many different ways in which a site can, without knowing, have duplicate content issues.

The new tag specifies the relationship between a document and an external resource that results to the site’s canonical or preferred URL form that would encompass all the sub-domains and variations of a website. The tag will inform search engine crawlers which canonical URL form it will use when retrieving search results.

The tag also putt the Canonical URL for at the forefront of the website content which will be used for accessing the page, regardless of the session id, link parameter, sort parameter, parameter order of the URL form.

To implement Canonical URL form on website, site owners just need to add the following link tag at head section of the site’s HTML.

 

By adding the link tag, the search engine crawlers will disregard the following URL variants but have the same contents as the identified Canonical URL form:

http://www.example.com/products?trackingid=feed
http://www.example.com/products?sessionid=hgjkeor2
http://www.example.com/products?printable=yes&trackingid=footer

In all probability, most people would not notice much of a difference. Perhaps that the URLs in search results on which they click will be a bit shorter, and perhaps that search engines would not be cluttered with repeats of the same pages in search results.

But the bigger advantages are for Webmasters, which can ensure a more consistent experience for people using their sites and cleaner data collected about how people use their sites, and for the search engines themselves, which would not have to make as many guesses about the pecking order of similar pages.

The alliance between Google, Microsoft and Yahoo! in support of the new canonical microformat symbolises a major step towards tackling duplicate content on the web. Previous agreements between the three search engine giants have been to support the sitemap protocol in 2006 and a standard set of directives for the robots.txt file. Collaboration between search engines on this occasion will translate as an internet free of duplicate clutter and ultimately, not only webmasters, but also users will benefit from this.

The Live Search Blog, Google Webmaster Blog, and Yahoo Search Blog identified some technical details to consider when implementing the tag on your sites.