Facebook Pixel
Join our Facebook Community

Fighting Scrapers With Your Left Jab

Posted By Darren Rowse 20th of January 2008 Miscellaneous Blog Tips 0 Comments

Left-JabThis guest post was submitted by Patrick who blogs at Piggy Bank Pie Writing Services.

I started 2008 with a post that went viral on StumbleUpon and BloggingZoom. Even Skellie and Caroline Middlebrook took the story back to their respective blogs. For those interested, the post in question was How I Received 850 Visitors Without Using Social Media Sites.

Image by Dave Hogg

But high visibility also comes with a price. Once a few bad guys hear about your blog, they hit you in the back like poor losers, running away with your own content to monetize their site. The opponents are called scrapers. Let’s challenge them on a five round match to see who deserves the title. Gentlemen, let’s have a clean fight, play by the rules, no punch below the belt and hitting behind the head.

Definition Of A Scraper

Using hacking tools, scrapers subscribe to your site with the intention of stealing your content right off your syndicated feed. Once you publish a new article, the program fetches your entire post from the RSS feed and publishes a carbon copy on the scraper’s site. If you haven’t taken some precautions, search engine crawlers can index the scraper’s content before yours, and even punish you for duplicate content.

Why They Do This?

Ever heard of Made For AdSense? Scrapers need content to feed their contextual ads such as Google AdSense. Since they are unable to write their own -hey, no hit below the belt- they steel yours and publish it on their blog where most of the time AdSense is used heavily. Scrapers often try to target specific keywords and their goal is to steal articles that help them rank high in Search Engine Results Page. With better ranking comes better traffic and obviously, better click-through rate on their ads.

What Can You Do?

I would LOVE to write the ultimate solution for preventing scrapers from playing against the rule. However, it is not that easy, and the process might be time consuming. But still, if you are minded to jump into the ring, here’s a five round fight strategy that could potentially bring your opponent down.

Let’s get ready to rumble.

1. License Your Content

The very first step I would recommend is to use a license service such as Creative Common. By licensing your content, at least you inform visitors that articles published on your site are subject to copyright laws. This allows you to specify under which conditions your work can be distributed. You can visit this page to choose the proper license for your work.

2. Add a Link To Your Orignal Post in Your RSS Feed

Joost de Valk, Shoemoney’s well-known webdeveloper, just wrote a WordPress plugin called RSS Footer that automates the process of adding a link in your RSS feed that points to the original source of your post. Here’s what Matt Cutts, a Google engineer, said recently in a interview about linking to the original source of an article:

“…if the syndicated article has a link to the original source of that article, then it is pretty much guaranteed the original home of that article will always have the higher PageRank, compared to all the syndicated copies. And that just makes it that much easier for us to do duplicate content detection and say: “You know what, this is the original article; this is the good one, so go with that.”

Installing and configuring RSS Footer is a piece of cake, I highly recommend you give it a shot.

3. Report Scrapers To AdSense

Visiting your scraper’s site could help you gain a few points in the fight. Have you ever clicked the Ads by Google link on AdSense ads? This opens up a page where you can subscribe to both AdWords and AdSense. However, if you look at the bottom of the page you will notice a link that says Send Google your thoughts on the site or the ads you just saw. The beauty of this link is that it knows where you are coming from -your scraper- and it fires up a questionnaire regarding the relevance of your scraper’s ads. Now is the time to throw a left jab:

  • Click Report a Violation?
  • This brings up a question asking if the issue is with the website or ads, select website;
  • You will now be asked which policy is violated, select The site is hosting/distributing my copyrighted content;
  • Finally, use the text box under Add additional information here to explain your story to the referee.

4. Report Scrapers To Google

Now is the time to send your opponent to the floor for a first count of 8. Go to google.com, type your scraper’s domain in the search field and hit the Google Search button. If it finds your scraper’s site, this means his website is indexed by Google. Now go to Google’s page to Report a Spam Result and proceed as followed:

  • Exact query that shows a problem: Type what you entered in Google’s search box to find your scraper’s site
  • Resulting Google page that shows problem: Enter the complete URL of the Google page returning the search result
  • The specific web page or site that is misbehaving: Type you scraper’s domain name
  • Type(s) of problem (check all that apply): Select Duplicate site or pages
  • Enter you story in the Additional details text box and click the Submit button.

5. Report Scrapers To Their Web Hosting Service

This is the ultimate opportunity to hit with a multi-punch combination. Go to whoishostingthis.com and type your scraper’s domain in the search box. This brings you a link to your scraper’s web hosting company. Once you are on the home page of the provider, look for a contact page. Use either online chat, email or a contact form to explain the situation. If you are required to provide a full and complete DMCA, I suggest you visit this page to get the DMCA form. If you go through all of this and your scraper gets kicked out by his web hosting service, consider you’ve won the fight by unanimous decision.

Summary

While this may not be an instant solution for preventing scrapers to steal content, it can surely make their life more difficult. If everything goes well and the scraper gets banned from AdSense, Google and his service provider, well, that’s a technical knockout. Now let’s just hope he’ll be out of the ring once and for all.

Has your content ever been stolen by scrapers? Have you tried some of the above strategies? Do you have other ideas to share? Please join the conversation over to comments.

About Darren Rowse
Darren Rowse is the founder and editor of ProBlogger Blog Tips and Digital Photography School. Learn more about him here and connect with him on Twitter, Facebook and LinkedIn.
Comments
  1. As I’m still really small i don’t have this problem – but i can really a see it being a problem for bigger sites.

    Really good tip about the RSS Footer – thank you, I will check it out.

    Staale aka Vikingblogger

  2. Although my German blog is fairly new (one week) my content has already been stolen :(

    But the most common is, that only a small excerpt is “stolen”, usually with the backlink to my page.

    But I don’t even like this. The whole blog which does this only contains of this small excerpts (from other blogs, too) and is, somethimes, ranked higher than mine.

    My question: Is it legal to copy almost every beginning (and the headline of course) of my content?

    I know, that you can quote someone and usually it’s cool when you’re featured by another “regular” blogger. But a collection of excerpts? Is that O.K.?

  3. And you can use http://www.copyscape.com to find duplicate pages on the web …

    Scrapers aren’t the only problem though. Some people just steal your concepts and turn it into their own article. If you’re a starting blogger who doesn’t have enough back links or ranking then these thieves basically benefit from YOUR work, cause they will rank more easily and have more exposure then the original blogger has …

    All I can say … be careful ….

  4. Useful post. Thanks.
    The nice thing about MFA sites though is that their trust rank is usually so low that they are hardly ever seen as the writer of quality content. And Google is usually pretty good at seeing through them.
    The worrisome scrapers are the ones that actually have a worthwhile site with some original content and call themselves ‘aggregators’. They are harder to catch by Google et al. and they could be ranked above the original article.

  5. Wow! This is an awesome post. I’ve had a little bit of trouble with scrapers in the past. This looks like a 50-45 fight to me! Thanks a lot!

  6. Nice article. Motivated me enough to install the plugin :)

  7. I have a lot of scrapers, taking content,mostly they get ignored, but sometimes Google ranks scrapers higher than the original. The worst was when a scraper took an article and got on the homepage of Digg with it. – That was the worst experience

  8. I had my content stolen by scrapers when I had my previous blog. I was so furious, but I felt helpless, too. I had no idea what I could do about it other than to send them nasty comments, which of course doesn’t help. I’ll have to keep these tips in mind. Thanks!

  9. Nicely worked out article.

    Gives some great steps to take if you find someone has stolen your work.

    This will come in handy I am sure

    Thanks

  10. When I write posts, I often times include my website name within the post. Then when it gets scraped, it doesn’t make any sense. But I also like the RSS footer idea.

    David Airey wrote a similar post about people stealing you images. One person stole his post and images, but now their site looks really bad. The post is How to deter thieves from stealing your images and server bandwidth.

  11. Great post. Now I have some work to do internalizing it and getting my blogs “plugged in.”

  12. I’m saving this for later, because I know I’ll need it, THANK YOU!

  13. Following all these steps are just too tiresome, especially if you get scraped by a thousand and one scrapers. It might be viable for small blogs but it’s just impossible for the big blogs. Seriously, rather than wasting my time on contacting AdSense, Google and their hosts, I’d rather spend my time on marketing my blog….

  14. >> they steel yours and publish it on their blog

    *steal*

  15. Setting up Google Alerts on article titles can be quite useful as well and there’s also a handy site called http://www.copyscape.com which will let you run a few free queries each day to help find content which is so similar to your that it must have either scraped or copied and then rewritten.

  16. This is great information, but the paranoid in me is wondering what happens if someone used these recourses to wrongly accuse you of content theft. Does the accused get a chance to defend their self or could this be a great/awful dirty trick of conviction by accusation? If I were a thieving scraper looking at losing my precious Adsense account I would probably swear that YOU were the real villain.

  17. Great info! I keep you on my rss feed just for all the great tips like this.

  18. Here’s what I’ve done:

    1) Modified related posts plugin to show related posts in the feed.

    2) Added a link back to my homepage in the RSS.

    This gives me a minimum of 6 links back to my page on every scraped article. I’ll probably add the link to the original source as well… good idea for the RSS footer plugin.

    When it comes right down to it, at some point it’s not worth worrying about scrapers anymore… it’s a losing battle.

    Unless you are losing traffic because of the scrapers, Google is pretty good about detecting them and removing them from the index.

    The tips in this article are good, though… I’ve had a couple of instances where fairly big sites were outright stealing my content, and one instance a long time ago where they outranked me with stolen content… so I used some of these tips with good results.

  19. Unfortunately, I couldn’t get RSS Feed to work with Feedburner, even after resyncing. And yes, my little site gets scraped. :(
    The other alternative is Angsuman’s Feed Copyrighter, which does work well with Feedburner, both through the feed and email. Its only limitation is that it doesn’t add the post URL, which if I get brave enough, I might try to figure out how to add.
    Also, be aware you WILL have to file a DMCA report with Google. And it can’t be through email. It must be faxed or snail mailed in.
    And one of the best resources for help is through Plagiarism Today, Jonathan Bailey’s site.

  20. Hi Guys!

    I have had excatly that problem you´re describing here. First thing I did was, writing a story about that guy and his “work”. But that didn´t work very well… So there were more scrapers…

    Some days ago I´ve found the Story at Chris Gerret´s Blog about RSS-Sticky and implementet it into my Blog. That way it should be a minimum of help.

    Mathias
    germany

  21. Went through and added my code to all my first paragraphs of my $$ site

  22. Darren, Patrick,
    thank you for this valuable information. I’ll install the plugin soon, too.
    It is offending if one writes a unique article with absolutely unique and maybe personal content and some Scrapers, as you call them, put upon you and steal your work.
    Fortunately I kind of see a growing community of fair online users, supporting each other and still making online profits.

  23. Very nice.

    One point to ad: I read recently where a lot of scraper sites are self-hosted, meaning that when you complain to their web hosting service, you are complaining directly to the scraper himself, masquerading as a hosting company through a reseller account. They will then “suspend” the account for a few months, then bring it back when no one is looking. They host hundreds of similar accounts so that there will always be accounts that are not “suspended.”

  24. I am glad no one has taken any content from http://www.livelymoney.blogspot.com and thank you for those tips and i hope you become like problogger one day

  25. I find a lot of my posts get “comments” from other sites who essentially will automatically pingback to my posts with an excerpt and a link to the original.

    Is that considered being scraped? I always delete the comments because the sites who pingback in that manner never have original content, they seem to just be compilations of many sources. Why is this done?

    TIA for any help!!

  26. I wouldn’t call them “hacking” tools. There’s no hacking going on. They are just taking your content. That’s like calling a Xerox maching or a CD burner a “hacking tool”.

    Wrong and bad? Yes. Hacking? No.

  27. I am very happy no one has taken any content from http://www.bloggers-help.blogspot.com
    http://www.flexsamples.blogspot.com
    http://www.flexexamples.blogspot.com

    these blog are great..

    thanks

  28. I’ll second Joost’s plugin, works great and its a nice solution

  29. I am glad no one has taken any content from [insert my own spamming URL here].

  30. #2 is awesome, I’m going to do that ASAP. my fitness blog got scraped and so did my garden blog, humph.

  31. Thanks for the tip on Joost’s RSS footer plug-in. I’ll add it as a precaution.

  32. Great post, Patrick. You’ve reminded me to install that RSS Footer plugin, which is languishing in my downloads folder.

  33. very useful posting
    I’ll be sure to d/l that plugin

    Thanks!

  34. Just install Joost’s plugin, or better yet make a habit of deep linking your content from within your own posts.

    Besides that the other steps are a waste of time and not productive. Get over it, would be my advice.

  35. My blog came standard with follow comments and the footer link in the RSS. Serendipity ( http://s9y.org/ ) is my favorite blogging platforms for these and many other reasons. All good advice though.

  36. Great post since I didn’t even know that such a program existed to scrape my content.
    I look forward to the day when I am getting enough traffic for this plugin to be usefull.

  37. @alanj878: Thanks for the kind words. I hope too, you can help me to get there by subscribing to my RSS feed at http://feeds.feedburner.com/piggybankpie :-)

    Anyone using Creative Common license?

  38. The problem is that Google Adsense doesn’t really care. I get scraped often and have reported to Google. They have emailed me back a canned response and never followed through with a ban.

    The Shoemoney option is definitely the best that I see.

  39. Adding a Related Post for Feeds plugins also helps giving you more links back to articles on your site from the scraper site.

    Some of these sites have the gall to give your original post a pingback.

  40. Thanks so much for collecting all this information in one place. This happened to me a while back, and I got them to stop, but next time, I’ll be able to review this info, too!

    The site actually published the part of the feed that says, “unsubscribe,” so I unsubscribed them from mine and the other blogs they were scraping! Ha! Take that!

  41. Good info. Adding to ‘To Do’ list….

  42. This guide is worth saving to delicious Patrick. Excellent post.

  43. Excellent plugin by Joost. Thanks for that tip since I am getting killed by other plugs ripping my feed this may help me a lot.

  44. Thanks for this I had no idea how I could fight this, I’ll be going to sort all this out asap…

  45. Nice job Patrick! Very timely piece for me…I just checked my Technorati information to find this link:

    http://cestgratuit.hautetfort.com/archive/2008/01/17/the-ten-most-inspirational-bloggers-of-2007.html

    They stole my post, word for word. What do you do?

  46. Great post Darren, so far I was hoping that my scraper would get off my back some day, but I see it’s time to take action. Although, many people say that I should just ignore it. Hmmm. BTW, you haven’t mentioned sending an email to a scraper, or posting a comment? You think they wouldn’t listen?

  47. I am happy to say that the only people scraping my feed are posting some lame excerpts. I don’t really think that is hurting me, as the excerpt link goes back to my site too … but then again I don’t run on advertising revenue so it may be a different story if you were.

    So what do you think? Is the syndication of an excerpt posting copyrighted content too? Or would they be in violation only for full posts?

    I don’t get how anyone would make money off an adsense site that just posts 2-sentence excerpts anyway.

  48. I looked at the license, but the part that bothered me was modifications. It seems like your choices were to allow modifications totally, or to not at all. Would not allowing modifications not allow people to post excerpts of it for discussion sake, like if the wanted to make a long quote?

  49. @ Marija: If Comments are turned on, scrapers obviously moderate them, so yes the guy gets your comment but you in return get an instant delete. As for the email, most of the scrapers do not have a contact page. I found an email address via the scraper’s domain whois information, tried emailing the owner of the domain, but again, obviously zero answer.

    @ Nikole Gipps: Either full or partial scraping is the same, no doubt about it.

    Thanks to all for the kind words.

    Patrick

  50. JW – I get a lot of the same thing.

A Practical Podcast… to Help You Build a Better Blog

The ProBlogger Podcast

A Practical Podcast…

Close
Open