Earlier this week Google’s “head of web spam”—Matt Cutts—posted on his blog that they’re implementing a change in their algorithm that impacts those that publish content from elsewhere on the Web.
The changes are all about ranking the original sources of content higher than those who scrape/republish/copy it. This has always been Google’s intent but increasingly some have been seeing scraped content ranking higher than original sources.
In Matt’s words:
“The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content.”
This has a couple of implications for bloggers of different types.
For those who produce blogs with original content, it hopefully means not being out-ranked by other sites reproducing your content (with or without permission). As someone who finds his own content appearing on other sites many times a day (many times without credit of the source), for me this is a welcome change.
For those who do use scraping (or syndication) strategies, this news might stimulate a rethink in that approach. I know there are times and places for syndication (particularly if you do so with permission), but this serves as a reminder that in most cases if you’re looking to build a prominent and successful blog, you need to produce something that’s not only relevant and useful, but is also unique.
I was so happy to see this change in Google. In my blogs and columns there is nothing more disheartening than to spend time putting together a thoughtful post, calling around for quotes and facts, and then seeing people lift it and drop it onto another site – without the links to the people being quoted, and without links to my writing. At a sentence or two, I’m flattered and expect them to make attribution and link to the article I wrote. Using the whole article or the guts of the article? Bah! Good for Google!
At last Google is going is going to make this happen. There will not be content copying any more. Real bloggers will be very much delighted to know this.
Sometime, we get sick chasing the copy cats in frustration. Hopefully this helps and with a effective algorithm hopefully gives credit to original author by identifying correctly.
It’ll be interesting to see how it all plays out. Seems like every time there is progress made in one area, those taking advantage just figure out a new work-around.
I’ve had a big problem with content scraping as I post up available domain names I’ve found on my blog for others to browse through and purchase if one interests them. Never fails, within a day of posting them up, I see some other site with the same list of available domains. ;-) Oh well. The more the merrier, but a link back WOULD be nice!
Thomas @ 99niches.com
I’m a journalist. Journalists can use a sentence or two from copyrighted material for review or commenting purposes.
Rita blogging at The Survive and Thrive Boomer Guide
yeah it’s a good news, i hope google must always check their algorithm to make it work better…
I wonder if this new algorithm will accommodate the international arena – I get a lot of people from other countries, especially Greece, Spain, Balkan and Asian countries, translating my content into their own language so it is extremely hard to tell if my content has been stolen. Practically the only way I know is when they link back to me to their source to try and authenticate. Could Google be smart enough to implement an inter-language algorithm?
Presuming that splog creators in general are lazy, they’re probably using automated translation tools to do this. Google could well check for this. However, if the language has been translated (automatically or otherwise) presumably it will no longer rank above yours for English keywords, so on that level you shouldn’t worry. Yes, they’re still ripping your content, which is bad, but shouldn’t outrank you for your primarily English-speaking audience.
If they will manage it properly than believe me Web is going to be another perfect place for the common users.
Waiting for the completion and results.
It would be reasonable to punish the republishers but to leave the original publisher be. Perhaps republished content should be tagged “noindex” by default?
I’m glad Google are focusing on the issue of duplicate content more closely. I work hard to produce unique and relevant posts for my blog and it’s so annoying when scraper sites simply copy my content and at times rank higher for it. I don’t think we will see an end to it straight away but at least Google have made their intentions clear.
yeah it’s a good news, i hope google must always check their algorithm to make it work better…
As I understand US copyright rules, if you register copyright within 90 days of publishing, you can collect statutory damages. Without the registration, you can only collect actual damages. Now, it may not be worthwhile to register each blog article individually, but if you produce a lot of articles, it may be worthwhile to aggregate three month’s worth of them and register them all at once as a single item. Also, the discussion of having copyright as soon as you write it down? That only applies in countries that have signed on to the Berne Convention. The last time I looked, China had not.
@Christin: If you’re referring to the Cooks Source scandal, the article was stolen from a Medieval cookery site, not a blog although the original author did blog about it. Also, also the content thief claimed not merely that anything on the Internet is susceptible to being “used” (unfortunately true), but that such usage is legal, i.e. that anything on the Internet is public domain (most emphatically not true). Then assorted people found out who else’s copyrights she’d been infringing, and in some cases, whose content had been plagiarized, and that was all she wrote. You don’t mess with the Mouse, or Martha Stewart. And they weren’t the only big names involved.
A general note here: Copyright infringement and plagiarism are not the same thing. Plagiarism is presenting other people’s ideas as your own, or at least not crediting the original source. You can plagiarize material that’s in the public domain. Copyright infringement is using material under copyright without the copyright holder’s permission. The article that kicked off the Cooks Source scandal was not plagiarized in that the original author’s byline was on it. However since the magazine editor didn’t get permission to use it, the author’s copyright was infringed.
Its a Good news for Original Bloggers………….
And if I own a quotes blog? The information on my website is all over the internet. Who will be out-ranked in this case?
This will be a great development. Honest works will get the recognition and better result. Just hoping that this development works.
Hopefully this will boost my Search Engine Traffic even more :)
Hi Darren,
This is good news for bloggers who put a lot of effort into creating their own unique and original content.
Thanks for sharing.
Mavis
i think its a good move by google. It is like people who do a hard job will get reward for its effort.
I write content for toptenz.net – and not only have I had my entire (sometimes 2500 word) articles stolen, I’ve had people leave comments accusing me of ripping off my own work from another site. So frustrating.
Hello,
This is interesting and quite the right way to go. But, my question is related to news-related sites. A particular news remains the same, but there are thousands of webites websites that put the same thing down in their own words. Any idea on how Google will manage that?
Ever since reading about the change in algorithm by Google, I have often wondered about that too. I guess we’ll just wait and see how they plan to handle that bearing in mind that if they choose to penalize news sites for that, they would have to start with Google News.
This is so relevant. As a journalist turned blogger, I only use original content. However, a blog with a huge circulation asked if we could swap articles once a week. They said they would scrape one story from my blog every week and I could take one of theirs. We would credit one another. I spend a lot of time writing posts, and post 2-3x a week. So, getting a “freebie” once a week would be nice. I also thought that since their readership is a lot more than mine, that would bring me a few of their readers.
On the other hand, I know they have entered into this arrangement with other bloggers. So what I choose to scrape could also be scrapped by others who entered into this deal.
I haven’t taken the plunge yet, and I’m glad I read this article. Still, I wonder since the other blogs circulation is a lot more than mine.
To me, this is a good news. I will no longer have to be worrying myself over those that do not give me credit after copying my posts.
Hallelujah! There is no reason why a scraper site should rank above the original publisher.
As someone who creates original content for their blog, this is reassuring. I’m often frustrated that I’m not able to turn out 5 posts/week (I have a FT job, community involvement, not to mention a family!), but I’m happy that I’ve maintained my integrity by not ripping off someone else’s work.
Thanks for this information, Darren. You never disappoint!
Finally some good news from Google!
How do we know Google will get it right? We don’t want them thinking that a highly-rated site that steals content is the original producer of that material.
I’m new to blogging so my question is this. If I like a post would this mean I would not be able to use this post on my blog or is okay to mention as long as I give credit where it is due?
This is a great topic Darren.
No, I don’t republish content from other blogs, I try to rewrite and put my own spin on things.
I’ve read many times that it makes a difference with search engines and improves your SEO factor.
Krizia
Great story Darren, and I am really excited to see Goggle going this route! I wont publish any content that has been posted elsewhere to our site the Mountain Weekly News. We have been trying to get google news to consider us a valid news source with no luck. One of the things they mentioned was we had titles that were the same as other sites. The one in questions was – Is Sarah Palin Responsible For the Shooting in AZ?, well it turns out Perez Hilton had the same title on her/his site.
So at-least they see whats happening, I can promise you http://mtnweekly.com has all 100% original content. I need to work on making my story titles stand out from the crowd.
Mike
Does Pro Blogger show up on google news?
I hope that google is finally going to be able to get this right!! It’s about time that the original source of the information ranks higher than the scraped masses of web spam.
Sure there are legitimate uses for scraping sites.. but I like to think that people would greatly benefit from having the original works show above all the copies in the search engines. Why would anyone spend their time and energy writing compelling and interesting topics when their site is guaranteed to rank lower than some huge scrape database site.
This is definitely a move in the right direction!
I’m glad that I read this and thank you for writing it. I was under the impression that it was ok to use syndicated content as long as nothing was changed and the original source was acknowledged. I have a sports site and write almost everything myself but wanted to provide my readers with more stories about the teams and sports I cover, but I stopped doing that awhile back. I’m not trying to use ignorance as an excuse,but I thought it was ok. I’ve worked to hard to get a PR3 and want to be viewed as honest and credible, should I go back and erase any old posts? I only used syndication for a few weeks and stopped it a few months ago.
i’m really green when it comes to the blogosphere. i mini-blog (tweet and a fb fan page)…if you can call that blogging at all.
i do lift contents of url’s (and copy & paste tweets) with subjects i find relevant to my cause for my daily postings. is this scraping and does that affect the original creator negatively? (btw, although i don’t ask for permission, credit to authors are ALWAYS mentioned in my posts.) i would’ve thought this is helping them spread their contents to a different audience….
at any rate, would anyone let me know…i don’t want to harm anyone. thanks.
Even my little blog has had stuff stolen from it! One of my readers told me about it.
What bugs me the most about this issue is the following, I’m Canadian so my blog is http://www.landlordrescue.ca. I’m in the province of Ontario. I’ve been in business for the last 5 years. Recently another company in Canada bought the .com domain unfortunately it seems like they are dishonest. I get calls regularly from people who have found my site and want to serve them with legal papers.
I’ve been kicking my ass for not buying the .com when it was for sale a few months ago. But at $1500 it was too much.
The only consolation about this entire thing is…my site rank is so much higher than theirs that even people searching on their company can only find me or my articles.
I usually find out when someone takes my content because I get a ping back, I report them to their web host. It is really easy. Simply go to Whois and put in the IP address. Then see who is their server host. It usually list their abuse email.
Recently, I had one particular groups of sites lift an article from my website. I think it was 17 sites. Gatorhost was their host. They were great to deal with and had a form set up that was simple to fill out. They dealt with those scrappers and no more copying my site.
Report them!
I don’t copy content from other sites unless I find it necessary for my readers.
Google’s this approach is good, but sites like http://www.tuxmachines.org will have major impact.
tuxmachines is the site which serves interesting articles with proper credits.
I believe google should set some exclusion for the sites sharing proper credits / pingback / who shares partial content rest is forwarded to original.
Google is the largest scraper on the internet and they have made a fortune scraping others content and monetizing it. You can bet Google will not push down their own results for original content.
Google has also been sued several times in the past for scraping it’s news content…and yet they continue.
Just another case of Google saying do as I say not as I do.
what if I re-write an article in my own? will it be considered as a copy?
How would this affect a site like AllTop?
I’m glad Google is making an attempt to clean up this monster they created.
A lot of the problem stems from the so-called ‘gurus’ who market their copycat software. They promise thousands, even millions of dollars if you simply buy their junk and push a couple of buttons.
I even know of a particular ‘auto-blog’ software package that not only steals content from rss feeds, but actually translates it in three different languages. Then when it gets translated back into English, The ‘guru’ claims that it’s now yours because Google no longer considers it original content.
And don’t let me get started about the software that steals YouTube videos!
I said all that to say that as more and more people look to get online and make money, in their ignorance, they fall prey to these sharks not realizing that what they are teaching is unethical and in some cases illegal.
What Google should focus on is helping to develop a proper format or set of standards when it comes to blogging, affiliate marketing, etc., which IMHO would make great strides in separating the wheat from the chaff.
Similar to what W3 did with HTML and CSS.
Google should have a time and quality based algorithm. The first guy who publishes the content online should get the top position and others who publish about the similar topic thereafter should get lower positions in search results just like a race completion . The same goes to quality.
Similar content can be found every where. Knowingly or unknowingly many websites/blogs may be publishing the same content. For example “The main content of particular event/news”, “Technical details of a gadget/vehicle ” etc remain same in all websites which deal about the particular subject. And it may be very difficult to find the original contents just from crawling keywords.
I am representing a particular sector of Industry and promote my clients online. There are many similar websites dealing with this subject and may find similarities in content too. No body know , who will survive in the stampede if Google started to filter them out.
However this industry is a victim of so called “Black hat SEO” and stealing of keywords .Most of the times searching a particular brand name shows many spam results in top ten, and driving traffic to websites who deals with counterfeit, fake/replica brands.
If Google or other search engines can get ride of this spammers, it will be a pleasure.
How about websites which take entertainment as their niche? Let’s say in TV Series, we often to republish official description for each TV Series’ episode. It would be weird if we rewrite it since we haven’t watch it yet.
This is great news! My blog has been scraped and content taken without permission multiple times. It’s impossible to keep up with all the feed scrapers and they almost never comply with simple requests to remove content. At least with Google the scraping sites value will drop significantly which may make them go about things in a more honest way. Just to note – I have never been asked to use my content before it was scraped.
Early in my blogging career, another blogger told me that reposting existing content is bad for SEO. So if I like something, I will write about it and post a link. Unique content is so much better anyhow.
I wonder how they determine which is the original content? The first to get spidered and catalogued? What if a person writes a blog post, or article and someone copies it before it is found by google, posts it on their site and then implements measures to get it found sooner.
This is a bit of concern as I quite often turn a longish comment I’ve made on someone else’s into a post.
As this will pre-date my blog’s post – I will now have to make sure I re-write or add a lot more to the content.
I dont know if anyone has stolen any of my work but I know I have borrowed some work from some others in the past. I always give them credit or link it back to the source. I also set it up on my site so that viewers couldnt steal any of my photos or text. I noticed before I implemented that Plug-In that a lot of viewers were after one piece of some artichokes that I photographed a while back.
Thanks for the post!
Semper Fi,
Manuel
But in the search results, they are still going to show up if they are well optimized. As it always happens with the article directories.
So, if the original articles ranks first, the copied site can actually rank second. And if it cares to change the title to a better one, it can actually get more visits.
I must confess this is an awesome improvement in search world
I’m glad Google are focusing on the issue of duplicate content more closely. I work hard to produce unique and relevant posts for my blog and it’s so annoying when scraper sites simply copy my content and at times rank higher for it. I don’t think we will see an end to it straight away but at least Google have made their intentions clear.
must confess this is an awesome improvement in search world