Die Duplicate Content Die!

by pittfall on February 16, 2009

duplicate-contentWhat does canonical mean?
According to a biology glossary from the University of Arizona it is “repetitive; repeating in different forms“.

So, what does it mean to online publishers?
Simply, duplicate content.

How does duplicate content exist, what is it’s impact on search engines and, more importantly, how does it affect online publishers in search engine results?

I know that many of you think that search engines are “all knowing” when it comes to understanding content on the web, but the simple fact is that they are bound by what can be determined by mathematical principles, equations also know as algorithms. Now, this mean that they determine things based upon these exact measurements, even when it relates to a query that is not mathematically based. The first level of complexity becomes visible… namely trying to determine a value for qualitative data in a quantitative means. This means that to arrive at a solution for a qualitative query they must convert the qualitative data (content) and the qualitative request (keyword) into a numeric value to measure them against each other, find the best answer (based upon their predetermined algorithm) and return the qualitative data to the user (results).

Now, we all know that there are a lot a metrics that are used in determining the quantitative value of a web page (domain, URL, location of page within the domain, relational or semantic value, inbound links, outbound links, domain authority, page authority, topical authority, content, domain age, content age and so on), but one of these has been a thorn in the side of the search engines and the publishers of said content, duplication of content.

How can that be?
Simple, publishers have wanted to make sure that the most relevant content is accessible to users, because when it comes down to it, users are the goal of every publisher. The same can be said about search engines. The user is the lifeblood of the Internet. Without them, this whole thing is pointless. The search engine want to return the most relevant results so the user will return when they are looking for something, and the publisher wants the user to find their products/services when they are looking for it.

This mostly affects publishers that use content management systems (CMS) to build dynamic websites from databases of product/page information (like WordPress). Now weather you are publishing pages with ids, tracking parameters, session ids and other forms of publishing a page with differing URLs you can “hint” to Google, Yahoo! and MSN/Live Search which the actual original page is without having to define 301 redirects or risk loosing your tracking so they are less likely to index two pages that are essentially the same page.

OK, in an effort to KISS (keep it simple stupid), me of course, the simple answer to the three questions posed at the beginning of this post:

  1. How does duplicate content exist? Typically from parameters being passed in URL strings or from CMS’ that publish the same page in multiple versions
  2. What is duplicate content’s impact on search engines? Pages with the same content and different URLs may not be recognized as the same page
  3. How does it affect online publishers in search engine results? Duplicate pages can gain value diverting the total value to multiple pages

So, it’s a bright and shiny day for publishers wanting to let search engines know which page is the most important and the search engines stop wasting space by storing the same page over and over again, right?

Not so fast!
Sighting the presentation deck announcing the new link tag, slide 8:

  • This is a hint, not a directive/mandate/requirement. Search engines choose when to use the suggestion
  • Far better to avoid dupes and normalize urls in the first place
  • If you’re a power user, exhaust alternatives first
  • Be careful. Regular bloggers/websites may want to wait for their software to be updated
  • If we see abuse, we reserve the right to react as needed

So, you will be best suited to find the answer to the real question, how can I accomplish what I am trying to do without impacting my rankings? Specifically, point two, avoid duplicate pages and normalize page URLs in the first place.

News about the new canonical tag
Google Webmaster Central Blog
Yahoo! Search Blog
Live Search Webmaster Center Blog
Search Engine Land
Search Engine Watch
Search Engine Roundtable
Search Engine Journal
SEOmoz
Marketing Pilgrim
Matt Cutts
YoastWordPress Plugin to fix canonical pages
WebProNews Video – Interview with Matt Cutts

related posts >>


{ 1 trackback }

Cartoon Barry Blog
February 17, 2009 at 10:33 am

{ 30 comments… read them below or add one }

1 Grand Rapids Web Design February 17, 2009 at 3:00 pm

Glad the engines came out with this but I still feel it doesn’t help if your site has a sloppy architecture. Interesting nevertheless.

Reply

2 Web Design Company February 17, 2009 at 4:18 pm

Duplicate content issue is one of the major concerns in Search engine optimisation. However it is just quite easy to go overboard and pay too much attention to duplicate content and search engine ranking. Article marketing is one such area where it is not feasible to write a different version of article for each submission.

Reply

3 Website News February 17, 2009 at 4:20 pm

Search engine algorithms are becoming more intelligent with time. Search engines are much better in differentiating between duplicate factors arising due to CMS and blogs for example.

Reply

4 anilime February 18, 2009 at 4:17 pm

I dont really understand why people want to duplicate content since Google always improving their search algorithm to make duplicated contents indexed lower. I never duplicate content because I know my site will do better with original and unique content.

Reply

5 SEO Link Builder February 19, 2009 at 5:20 pm

I had a client that couldn`t figure out why one of their pages would not rank in Google, no matter what they did. They built links, had PR, the rest of their site was ranking…. I put them in CopyScape, saw they had duplicate content on the page and asked them to change it. Turned out that he had to put up the content based off of the manufactures product description, and he actually had to get their permission in order to change the content. As soon as he did change the content, the duplicate content penalty was lifted and the page was ranking within 3 weeks. Just thought that was interesting and something I wanted to share with future clients who make have a manufacturer required piece of content on their site, who may not know any better and not realize it could be looked at as duplicate content.

Reply

6 Phil February 20, 2009 at 4:36 am

Great article!

I had the same issue as SEO Link Builder. As soon as the duplicate content was changed, the page gained a better ranking in no time.

I would suggest this very helpful tool to check your duplicate content:
http://www.copyscape.com

Regards,
Phil

Reply

7 Dedicated Windows Server Hosting February 20, 2009 at 11:40 am

There are many proof on the net that duplicate content have not any effect on the ranking of the websites!! Yes it is true! you can submit many same article on many article directories without any problem and all those article backlinks will be counted by G. The Important point is to do not use Duplicated Content on the a different page of a website.
Good Luck

Reply

8 Minnesota Lawyer February 21, 2009 at 12:39 am

It is always better to maintain unique articles rather than copied or duplicated contents. Very nice post indeed!

Reply

9 Matt February 22, 2009 at 2:56 pm

Duplicate content is a sure way to get your website ranked poorly in the search engines. I always try to write unique content not just for search engine rank, but because it offers my visitors a reason to keep returning to my websites.

Reply

10 Four Eyes Squad February 23, 2009 at 10:28 am

There are times when a duplicate content doesn’t really give much effect in rankings. But there are also times when Google puts a some sort of tracking device in your site and when you make a little mistake, your rankings are affected. Duplicate or not, in the end, its how relevant the content is to the site that matters.

Reply

11 Bob February 26, 2009 at 9:32 am

A very good review of search engine optimization and some good notes on content duplication. Thank you.

Reply

12 Short Sales February 28, 2009 at 8:56 pm

We try to avoid duplicating content, but it is hard when you focus on a specific term like short sales like we do. People scraping our content from our blogs has always been a concern for this rule, but I always thought if our sites pages were indexed first there might be some kind of credit given for that.

Reply

13 Davey March 4, 2009 at 4:29 am

Thanx for the Article Pitfall.

I am in the process of designing a new site and researching SEO and duplicate content so that I can optimize my pages for the search engines, sooner than later.

Reply

14 Dajat March 4, 2009 at 7:28 pm

Nice article, very useful for me that newcomer in blogging world. Thank you.

Reply

15 botez March 6, 2009 at 6:58 am

Firs of all the duplicate content appear as analternative for webb directors to prepare a webbsites for the best appereance in google. I tested that and is still working so until google will develop some better way to see the trikky things in my hidden pages from the websites i am going to aim for the best spot in google seraches with any weapon thati have including that duplicate content.

Reply

16 Nick March 7, 2009 at 9:03 am

Duplicate content is definately an area to look out for especially when dealing with product specifications.

Reply

17 css forum March 7, 2009 at 9:16 pm

Thanks for the informative post. Duplicate content has recently killed one of my forums. I blame duplicate site descriptions from bringing my directory from pr5 to pr2 during the last update.

Reply

18 Privacy Policy Template March 14, 2009 at 11:07 am

As newbi web publishers we have been reading all we can about the pitfalls to avoid in promoting our privacy site and avoiding duplicate content seems to be a key pitfall to avoid. We would like to add copies of the wording from various privacy policies to our site but a number of them have the same clauses in them so we were concerned about duplicate content.

Reply

19 pittfall March 14, 2009 at 11:28 am

@Privacy
Thanks for the comment. If you are looking to rank for these privacy policy keywords, it might be a concern, however, as I am sure is the case, most of these sections of content are unique is some manner. Duplicate content is a concern when all of it is duplicated.

Reply

20 Brad Hart March 15, 2009 at 10:56 am

I really like Joost’s plugin for taking care of this task this is really among his best work when it comes to improving wordpress. I am not using this and many of his other ones on my blogs.

Reply

21 Web Bisnis June 19, 2009 at 8:48 pm

There was an issue about google sandbox and duplicate content.

Thank’s for your advice.

Reply

22 techguide1 July 14, 2009 at 12:46 am

does canonical always stands for duplicate contents? http://www.epiki.com/what-is-relcanonical-attribute here got another point about similar or identical urls.

Reply

23 Tom August 14, 2009 at 12:10 pm

Does adding the canonical link tag to a page have the same basic end result (preventing duplicate indexing) as using mod_rewrite and the .htaccess file to “force” a non-www url to do a 301 redirect to the full url (one that contains www)? I have no access to the server/OpenCMS admin environment we’re running on and am desperately looking for some way to achieve this effect but putting something in our index JSP page (so that “domain.com” and “www.domain.com/index.html” and “domain.com/index.html” all resolve to “www.domain.com”).

Reply

24 Designz Today January 18, 2010 at 8:57 am

Well everyone talks about duplicate content and how it affects search engine ranking. But the thing is that the search engine checks whether a new webpage consists of duplicate content by checking the new webpage with the already indexed webpages in their search engine index. This is done by an algorithm which checks for comparison and if a duplicate content comparison percentage is anywhere above 20-30% then only is your webpage penalized for carrying webpage content.

Reply

25 Riccar vacuum bags March 15, 2010 at 4:03 pm

Some duplicate content may cause pages to be filtered at the time of serving of results by search engines, and there is no guarantee as to which version of a page will show in results and which versions won’t. Duplicate content may also lead to some sites and some pages not being indexed by search engines at all, or may result in a search engine crawling program stopping the indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.

Reply

26 Sweet sixteen ideas May 11, 2010 at 7:10 am

You made some good points here.Keep us posting. Excellent article i am sure that i will come back here soon. What template do you use in your site?

Reply

27 Marine fuel tanks May 16, 2010 at 1:36 am

There are times when a duplicate content doesn’t really give much effect in rankings. But there are also times when Google puts a some sort of tracking device in your site and when you make a little mistake, your rankings are affected.

Reply

28 First up gazebo May 19, 2010 at 3:42 pm

Duplicate content issue is one of the major concerns in Search engine optimisation. I did read this blog and it was very interesting. I liked the second part the most.

Reply

29 Ceramic garden stool May 30, 2010 at 1:04 pm

Duplicate content issue is a major concern in the search engine optimization. However, it is only very easy to go overboard and pay too much attention to duplicate content and search engine rankings. Article marketing is one of those areas where it is possible to write a different version of the article for each presentation.

Reply

30 Rocking chairs for nursery October 20, 2010 at 2:55 pm

Search engine algorithms are becoming more intelligent with time. Search engines are much better in differentiating between duplicate factors arising due to CMS and blogs for example.really good quote

Reply

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>