What does canonical mean?
According to a biology glossary from the University of Arizona it is “repetitive; repeating in different forms“.
So, what does it mean to online publishers?
Simply, duplicate content.
How does duplicate content exist, what is it’s impact on search engines and, more importantly, how does it affect online publishers in search engine results?
I know that many of you think that search engines are “all knowing” when it comes to understanding content on the web, but the simple fact is that they are bound by what can be determined by mathematical principles, equations also know as algorithms. Now, this mean that they determine things based upon these exact measurements, even when it relates to a query that is not mathematically based. The first level of complexity becomes visible… namely trying to determine a value for qualitative data in a quantitative means. This means that to arrive at a solution for a qualitative query they must convert the qualitative data (content) and the qualitative request (keyword) into a numeric value to measure them against each other, find the best answer (based upon their predetermined algorithm) and return the qualitative data to the user (results).
Now, we all know that there are a lot a metrics that are used in determining the quantitative value of a web page (domain, URL, location of page within the domain, relational or semantic value, inbound links, outbound links, domain authority, page authority, topical authority, content, domain age, content age and so on), but one of these has been a thorn in the side of the search engines and the publishers of said content, duplication of content.
How can that be?
Simple, publishers have wanted to make sure that the most relevant content is accessible to users, because when it comes down to it, users are the goal of every publisher. The same can be said about search engines. The user is the lifeblood of the Internet. Without them, this whole thing is pointless. The search engine want to return the most relevant results so the user will return when they are looking for something, and the publisher wants the user to find their products/services when they are looking for it.
This mostly affects publishers that use content management systems (CMS) to build dynamic websites from databases of product/page information (like WordPress). Now weather you are publishing pages with ids, tracking parameters, session ids and other forms of publishing a page with differing URLs you can “hint” to Google, Yahoo! and MSN/Live Search which the actual original page is without having to define 301 redirects or risk loosing your tracking so they are less likely to index two pages that are essentially the same page.
OK, in an effort to KISS (keep it simple stupid), me of course, the simple answer to the three questions posed at the beginning of this post:
- How does duplicate content exist? Typically from parameters being passed in URL strings or from CMS’ that publish the same page in multiple versions
- What is duplicate content’s impact on search engines? Pages with the same content and different URLs may not be recognized as the same page
- How does it affect online publishers in search engine results? Duplicate pages can gain value diverting the total value to multiple pages
So, it’s a bright and shiny day for publishers wanting to let search engines know which page is the most important and the search engines stop wasting space by storing the same page over and over again, right?
Not so fast!
Sighting the presentation deck announcing the new link tag, slide 8:
- This is a hint, not a directive/mandate/requirement. Search engines choose when to use the suggestion
- Far better to avoid dupes and normalize urls in the first place
- If you’re a power user, exhaust alternatives first
- Be careful. Regular bloggers/websites may want to wait for their software to be updated
- If we see abuse, we reserve the right to react as needed
So, you will be best suited to find the answer to the real question, how can I accomplish what I am trying to do without impacting my rankings? Specifically, point two, avoid duplicate pages and normalize page URLs in the first place.
News about the new canonical tag
Google Webmaster Central Blog
Yahoo! Search Blog
Live Search Webmaster Center Blog
Search Engine Land
Search Engine Watch
Search Engine Roundtable
Search Engine Journal
SEOmoz
Marketing Pilgrim
Matt Cutts
Yoast – WordPress Plugin to fix canonical pages
WebProNews Video – Interview with Matt Cutts
related posts >>
- Google Speaks – Three Things SEOs Should Listen To
- The Need for Duplicate Content
- Top 10 Posts of 2009
- Spam Defined
- Supplemental Results












{ 1 trackback }
{ 30 comments… read them below or add one }
Glad the engines came out with this but I still feel it doesn’t help if your site has a sloppy architecture. Interesting nevertheless.
Duplicate content issue is one of the major concerns in Search engine optimisation. However it is just quite easy to go overboard and pay too much attention to duplicate content and search engine ranking. Article marketing is one such area where it is not feasible to write a different version of article for each submission.
Search engine algorithms are becoming more intelligent with time. Search engines are much better in differentiating between duplicate factors arising due to CMS and blogs for example.
I dont really understand why people want to duplicate content since Google always improving their search algorithm to make duplicated contents indexed lower. I never duplicate content because I know my site will do better with original and unique content.
I had a client that couldn`t figure out why one of their pages would not rank in Google, no matter what they did. They built links, had PR, the rest of their site was ranking…. I put them in CopyScape, saw they had duplicate content on the page and asked them to change it. Turned out that he had to put up the content based off of the manufactures product description, and he actually had to get their permission in order to change the content. As soon as he did change the content, the duplicate content penalty was lifted and the page was ranking within 3 weeks. Just thought that was interesting and something I wanted to share with future clients who make have a manufacturer required piece of content on their site, who may not know any better and not realize it could be looked at as duplicate content.
Great article!
I had the same issue as SEO Link Builder. As soon as the duplicate content was changed, the page gained a better ranking in no time.
I would suggest this very helpful tool to check your duplicate content:
http://www.copyscape.com
Regards,
Phil
There are many proof on the net that duplicate content have not any effect on the ranking of the websites!! Yes it is true! you can submit many same article on many article directories without any problem and all those article backlinks will be counted by G. The Important point is to do not use Duplicated Content on the a different page of a website.
Good Luck
It is always better to maintain unique articles rather than copied or duplicated contents. Very nice post indeed!
Duplicate content is a sure way to get your website ranked poorly in the search engines. I always try to write unique content not just for search engine rank, but because it offers my visitors a reason to keep returning to my websites.
There are times when a duplicate content doesn’t really give much effect in rankings. But there are also times when Google puts a some sort of tracking device in your site and when you make a little mistake, your rankings are affected. Duplicate or not, in the end, its how relevant the content is to the site that matters.
A very good review of search engine optimization and some good notes on content duplication. Thank you.
We try to avoid duplicating content, but it is hard when you focus on a specific term like short sales like we do. People scraping our content from our blogs has always been a concern for this rule, but I always thought if our sites pages were indexed first there might be some kind of credit given for that.
Thanx for the Article Pitfall.
I am in the process of designing a new site and researching SEO and duplicate content so that I can optimize my pages for the search engines, sooner than later.
Nice article, very useful for me that newcomer in blogging world. Thank you.
Firs of all the duplicate content appear as analternative for webb directors to prepare a webbsites for the best appereance in google. I tested that and is still working so until google will develop some better way to see the trikky things in my hidden pages from the websites i am going to aim for the best spot in google seraches with any weapon thati have including that duplicate content.
Duplicate content is definately an area to look out for especially when dealing with product specifications.
Thanks for the informative post. Duplicate content has recently killed one of my forums. I blame duplicate site descriptions from bringing my directory from pr5 to pr2 during the last update.
As newbi web publishers we have been reading all we can about the pitfalls to avoid in promoting our privacy site and avoiding duplicate content seems to be a key pitfall to avoid. We would like to add copies of the wording from various privacy policies to our site but a number of them have the same clauses in them so we were concerned about duplicate content.
@Privacy
Thanks for the comment. If you are looking to rank for these privacy policy keywords, it might be a concern, however, as I am sure is the case, most of these sections of content are unique is some manner. Duplicate content is a concern when all of it is duplicated.
I really like Joost’s plugin for taking care of this task this is really among his best work when it comes to improving wordpress. I am not using this and many of his other ones on my blogs.
There was an issue about google sandbox and duplicate content.
Thank’s for your advice.
does canonical always stands for duplicate contents? http://www.epiki.com/what-is-relcanonical-attribute here got another point about similar or identical urls.
Does adding the canonical link tag to a page have the same basic end result (preventing duplicate indexing) as using mod_rewrite and the .htaccess file to “force” a non-www url to do a 301 redirect to the full url (one that contains www)? I have no access to the server/OpenCMS admin environment we’re running on and am desperately looking for some way to achieve this effect but putting something in our index JSP page (so that “domain.com” and “www.domain.com/index.html” and “domain.com/index.html” all resolve to “www.domain.com”).
Well everyone talks about duplicate content and how it affects search engine ranking. But the thing is that the search engine checks whether a new webpage consists of duplicate content by checking the new webpage with the already indexed webpages in their search engine index. This is done by an algorithm which checks for comparison and if a duplicate content comparison percentage is anywhere above 20-30% then only is your webpage penalized for carrying webpage content.
Some duplicate content may cause pages to be filtered at the time of serving of results by search engines, and there is no guarantee as to which version of a page will show in results and which versions won’t. Duplicate content may also lead to some sites and some pages not being indexed by search engines at all, or may result in a search engine crawling program stopping the indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.
You made some good points here.Keep us posting. Excellent article i am sure that i will come back here soon. What template do you use in your site?
There are times when a duplicate content doesn’t really give much effect in rankings. But there are also times when Google puts a some sort of tracking device in your site and when you make a little mistake, your rankings are affected.
Duplicate content issue is one of the major concerns in Search engine optimisation. I did read this blog and it was very interesting. I liked the second part the most.
Duplicate content issue is a major concern in the search engine optimization. However, it is only very easy to go overboard and pay too much attention to duplicate content and search engine rankings. Article marketing is one of those areas where it is possible to write a different version of the article for each presentation.
Search engine algorithms are becoming more intelligent with time. Search engines are much better in differentiating between duplicate factors arising due to CMS and blogs for example.really good quote