Google Gives Clarification on Duplicate Content

Posted By Darren Rowse 21st of December 2006 Search Engine Optimization 0 Comments

If you’ve ever wondered what Google does and doesn’t clasify as ‘duplicate content’ then you might find this explanation of it on their official Webmaster blog.

A few snippets:

“What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries….

What does Google do about it?
During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments … so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index….

Don’t fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it’s highly unlikely that such sites can negatively impact your site’s presence in Google. If you do spot a case that’s particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.”

This last part will undoubtedly calm a few nervous bloggers that I know who worry about their content being scraped and republished on other people’s blogs and being penalized for it.

About Darren Rowse

Darren Rowse is the founder and editor of ProBlogger Blog Tips and Digital Photography School. Learn more about him here and connect with him on Twitter, Facebook and LinkedIn.

Comments

Ross M Karchner says: 12/21/2006 at 6:58 am

Any opinion on whether the “DupPrevent ” WordPress plugin is worthwhile?
Grayson De Ritis says: 12/21/2006 at 7:28 am

Good to know; makes being organizes even more important!
Nathan Hughes says: 12/21/2006 at 7:33 am

And what about redirects? I have my own domain registered that redirects to my Typepad blog, and I’ve been told before that Google doesn’t like the type of redirect that GoDaddy.com uses because it’s classified as “temporary”.
Dan del Villano says: 12/21/2006 at 8:00 am

That is welcome news about the scraping… our site usually has more links from nasty little sorts who are grabbing random text than it does from legit places. I’ve had more than one fantasy about dumping a bucket of pixels over their heads…
Windows Shopper says: 12/21/2006 at 9:33 am

“Duplicate Content” is an interesting euphemism. They called it plagiarism when I was in school. I understand that some lack the language with which to express themselves, but really! Stealing verbatim the work of another is simply tacky. Reprehensible, even.An exception to the “Imitation is the greatest form of compliment” adage.
Gangster says: 12/21/2006 at 10:19 am

Great Post! I know alot of bloggers are wondering about this. The bottom-line always remains the same, if you are honest you have nothing to worry about!
Darren Cronian says: 12/21/2006 at 11:38 am

The scraping of content via RSS feeds is annoying to say the least, and especially if you regularly write unique content yourself, and then someone creates a blog, uses your RSS feed and slaps Adsense all over your content.

It’s good to know that Google is aware of such sites.
Stephen Fowler says: 12/22/2006 at 12:17 am

Thanks for that Seth, always wondered on this point also, so thanks for pointing it out.
and have a Great Xmas
Arpit Tambi says: 12/22/2006 at 6:51 am

hmm.. so its much likely they’ll filter than kick out of the index…

I’ve a query…
Like one owns a PR 6 site… and they pick up an article from a very new blog(within an hour its posted). Its much likely that gooogle will crawl and index PR6 website much earlier than that blog page.

Does that means the blog site(or a PR 0 site) will be filtered out of the results. They didnt make it clear how they filter..

you have any idea?
Clark says: 12/22/2006 at 7:22 pm

I set up a site that serves as a repository of all the articles I post across all the blogs I write to – I guess this isn’t allowed?
Optimizare says: 12/25/2006 at 1:49 am

Motor de cautare personalizat pentru optimizatori…

Am folosit facilitatea de personalizare a Google pentru a construi un motor de cautare care include doar site-urile de interes (cel putin cele pe care le urmaresc eu, mai des sau mai rar) – SEO Community Search Engine.
L-am facut in primul rand pentru …
George says: 12/26/2006 at 4:59 pm

Very good news, now I don’t have to worry about one site I found this week that was scraping post from one of my site that winded its way to Topix.
Devlounge | For the Love of the Web, Please Use Full Content Feeds! says: 01/06/2007 at 1:31 am

[…] People (or bots) are ripping the content. Google has made it clear that sploggers will not affect your ranking. It is easier to steal content using a full content feed, but why let a few bastards stealing content ruin it for the rest of the readers? Furthermore, if people want to rip-off your content, they will. A partial feed is not going to stop a ripper from going to your site, going to View-Source, and copying your content. […]
John M Weaver says: 01/12/2007 at 3:15 pm

Exactly what i was looking for. Thanks man!