Give me 31 Days and I’ll Give You a Better Blog… Guaranteed

Check out 31 Days to Build a Better Blog

Give me 31 Days and I’ll Give You a Better Blog

Check it out

A Practical Podcast… to Help You Build a Better Blog

The ProBlogger Podcast

A Practical Podcast…

FREE Problogging tips delivered to your inbox  

Google Gives Clarification on Duplicate Content

Posted By Darren Rowse 21st of December 2006 Search Engine Optimization 0 Comments

If you’ve ever wondered what Google does and doesn’t clasify as ‘duplicate content’ then you might find this explanation of it on their official Webmaster blog.

A few snippets:

“What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries….

What does Google do about it?
During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments … so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index….

Don’t fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it’s highly unlikely that such sites can negatively impact your site’s presence in Google. If you do spot a case that’s particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.”

This last part will undoubtedly calm a few nervous bloggers that I know who worry about their content being scraped and republished on other people’s blogs and being penalized for it.

About Darren Rowse
Darren Rowse is the founder and editor of ProBlogger Blog Tips and Digital Photography School. Learn more about him here and connect with him on Twitter, Facebook, Google+ and LinkedIn.
  • Any opinion on whether the “DupPrevent ” WordPress plugin is worthwhile?

  • Good to know; makes being organizes even more important!

  • And what about redirects? I have my own domain registered that redirects to my Typepad blog, and I’ve been told before that Google doesn’t like the type of redirect that uses because it’s classified as “temporary”.

  • That is welcome news about the scraping… our site usually has more links from nasty little sorts who are grabbing random text than it does from legit places. I’ve had more than one fantasy about dumping a bucket of pixels over their heads…

  • “Duplicate Content” is an interesting euphemism. They called it plagiarism when I was in school. I understand that some lack the language with which to express themselves, but really! Stealing verbatim the work of another is simply tacky. Reprehensible, even.An exception to the “Imitation is the greatest form of compliment” adage.

  • Great Post! I know alot of bloggers are wondering about this. The bottom-line always remains the same, if you are honest you have nothing to worry about!

  • The scraping of content via RSS feeds is annoying to say the least, and especially if you regularly write unique content yourself, and then someone creates a blog, uses your RSS feed and slaps Adsense all over your content.

    It’s good to know that Google is aware of such sites.

  • Thanks for that Seth, always wondered on this point also, so thanks for pointing it out.
    and have a Great Xmas

  • hmm.. so its much likely they’ll filter than kick out of the index…

    I’ve a query…
    Like one owns a PR 6 site… and they pick up an article from a very new blog(within an hour its posted). Its much likely that gooogle will crawl and index PR6 website much earlier than that blog page.

    Does that means the blog site(or a PR 0 site) will be filtered out of the results. They didnt make it clear how they filter..

    you have any idea?

  • I set up a site that serves as a repository of all the articles I post across all the blogs I write to – I guess this isn’t allowed?

  • Pingback: Optimizare()

  • George

    Very good news, now I don’t have to worry about one site I found this week that was scraping post from one of my site that winded its way to Topix.

  • Pingback: Devlounge | For the Love of the Web, Please Use Full Content Feeds!()

  • Exactly what i was looking for. Thanks man!