This guest post was submitted by Patrick who blogs at Piggy Bank Pie Writing Services.
I started 2008 with a post that went viral on StumbleUpon and BloggingZoom. Even Skellie and Caroline Middlebrook took the story back to their respective blogs. For those interested, the post in question was How I Received 850 Visitors Without Using Social Media Sites.
Image by Dave Hogg
But high visibility also comes with a price. Once a few bad guys hear about your blog, they hit you in the back like poor losers, running away with your own content to monetize their site. The opponents are called scrapers. Let’s challenge them on a five round match to see who deserves the title. Gentlemen, let’s have a clean fight, play by the rules, no punch below the belt and hitting behind the head.
Definition Of A Scraper
Using hacking tools, scrapers subscribe to your site with the intention of stealing your content right off your syndicated feed. Once you publish a new article, the program fetches your entire post from the RSS feed and publishes a carbon copy on the scraper’s site. If you haven’t taken some precautions, search engine crawlers can index the scraper’s content before yours, and even punish you for duplicate content.
Why They Do This?
Ever heard of Made For AdSense? Scrapers need content to feed their contextual ads such as Google AdSense. Since they are unable to write their own -hey, no hit below the belt- they steel yours and publish it on their blog where most of the time AdSense is used heavily. Scrapers often try to target specific keywords and their goal is to steal articles that help them rank high in Search Engine Results Page. With better ranking comes better traffic and obviously, better click-through rate on their ads.
What Can You Do?
I would LOVE to write the ultimate solution for preventing scrapers from playing against the rule. However, it is not that easy, and the process might be time consuming. But still, if you are minded to jump into the ring, here’s a five round fight strategy that could potentially bring your opponent down.
Let’s get ready to rumble.
1. License Your Content
The very first step I would recommend is to use a license service such as Creative Common. By licensing your content, at least you inform visitors that articles published on your site are subject to copyright laws. This allows you to specify under which conditions your work can be distributed. You can visit this page to choose the proper license for your work.
2. Add a Link To Your Orignal Post in Your RSS Feed
Joost de Valk, Shoemoney’s well-known webdeveloper, just wrote a WordPress plugin called RSS Footer that automates the process of adding a link in your RSS feed that points to the original source of your post. Here’s what Matt Cutts, a Google engineer, said recently in a interview about linking to the original source of an article:
“…if the syndicated article has a link to the original source of that article, then it is pretty much guaranteed the original home of that article will always have the higher PageRank, compared to all the syndicated copies. And that just makes it that much easier for us to do duplicate content detection and say: “You know what, this is the original article; this is the good one, so go with that.”
Installing and configuring RSS Footer is a piece of cake, I highly recommend you give it a shot.
3. Report Scrapers To AdSense
Visiting your scraper’s site could help you gain a few points in the fight. Have you ever clicked the Ads by Google link on AdSense ads? This opens up a page where you can subscribe to both AdWords and AdSense. However, if you look at the bottom of the page you will notice a link that says Send Google your thoughts on the site or the ads you just saw. The beauty of this link is that it knows where you are coming from -your scraper- and it fires up a questionnaire regarding the relevance of your scraper’s ads. Now is the time to throw a left jab:
- Click Report a Violation?
- This brings up a question asking if the issue is with the website or ads, select website;
- You will now be asked which policy is violated, select The site is hosting/distributing my copyrighted content;
- Finally, use the text box under Add additional information here to explain your story to the referee.
4. Report Scrapers To Google
Now is the time to send your opponent to the floor for a first count of 8. Go to google.com, type your scraper’s domain in the search field and hit the Google Search button. If it finds your scraper’s site, this means his website is indexed by Google. Now go to Google’s page to Report a Spam Result and proceed as followed:
- Exact query that shows a problem: Type what you entered in Google’s search box to find your scraper’s site
- Resulting Google page that shows problem: Enter the complete URL of the Google page returning the search result
- The specific web page or site that is misbehaving: Type you scraper’s domain name
- Type(s) of problem (check all that apply): Select Duplicate site or pages
- Enter you story in the Additional details text box and click the Submit button.
5. Report Scrapers To Their Web Hosting Service
This is the ultimate opportunity to hit with a multi-punch combination. Go to whoishostingthis.com and type your scraper’s domain in the search box. This brings you a link to your scraper’s web hosting company. Once you are on the home page of the provider, look for a contact page. Use either online chat, email or a contact form to explain the situation. If you are required to provide a full and complete DMCA, I suggest you visit this page to get the DMCA form. If you go through all of this and your scraper gets kicked out by his web hosting service, consider you’ve won the fight by unanimous decision.
While this may not be an instant solution for preventing scrapers to steal content, it can surely make their life more difficult. If everything goes well and the scraper gets banned from AdSense, Google and his service provider, well, that’s a technical knockout. Now let’s just hope he’ll be out of the ring once and for all.
Has your content ever been stolen by scrapers? Have you tried some of the above strategies? Do you have other ideas to share? Please join the conversation over to comments.