Facebook Pixel
Join our Facebook Community

Stop Scrapers and Spammers Fast

Posted By Darren Rowse 20th of October 2009 Miscellaneous Blog Tips 0 Comments

One of the challenges that bloggers face is what to do when others want to use your blog for their own gain by either taking your content or spamming your comments section. The more I talk to bloggers about how they deal with these issues the more I realize how many different approaches there are to the problems. Today Seth Waite from Blogussion shares his approach. I’d love to hear your approach (whether it be different or the same in comments below).

Every blogger quickly learns the reality of hard work in blogging. After the “make money fast” hype has wore off and the reality that blogging is a great way to earn an income if you work for it has set in, you are left with a choice?

The choice is whether to stay in blogging or not. Many bloggers decide to stay but are again left with another extremely important decision. Should I put the effort into become a great blogger or just try to still do things the easy way and hope things will be different for me?

Those choosing to work hard begin the process of learning and eventually find success by learning, networking and earning their way to better blogging. Bloggers who are unwilling to face reality either quite or eventually become spammers, scrapers, or beggars.

I am not going to address the problem of bloggers who beg for help without working for it, but I do want to talk about spammers and scrapers. Most importantly, I want every hard working blogger to know how to stop selfish bloggers trying to use your work disrespectfully to help them.

Stopping Spam

The easiest way to stop spammers who are trying to get you to link to their blog/site is by controlling your comments and trackbacks. Although essential to building a great blog community, comments must be moderated to ensure your actual readers feel comfortable with the discussions on your blog.

Captcha

Commenting at first was easily controlled by forcing commentators to put their email address into the comment form. Spammers quickly got around this and now a very easy way to stop spammers is by adding a captcha feature to your blog comments.

Captcha is already used by Blogger and easily adds to WordPress and other blogging platforms with plugins. The way it works is that you put in a series of numbers or letters from a visual image in order to post your comment. Other systems require you to add the numbers or fill in the form based on another easy question. Using captcha is a quick and easy way to minimize your blog’s spam, but it may also be annoying to regular readers.

Plug-ins

For many blog platforms, like WordPress, a simple plug-in will solve many of the spam problems. The most common spam blocker is Akismet, which is now available for over 20 other blogging platforms besides WordPress. Using this plug-in on your blog is simple and requires you to only check to make sure occasional comments are not being counted as spam. In addition to the normal comment protection it provides, it goes above and beyond captchas by protecting your blog against unwanted trackbacks.

Stopping Scrappers

Scrapers are bloggers who steal content you produced and put the entire work on their own blogs and websites. The practice sadly is common and creates reproductions of your content around the web. Luckily most search engines are good at recognizing the original content, but scrapping is illegal and damaging to the blogger and blogging.

  1. Identify: The first step to stopping scrappers is by identifying your content and checking for copies. An easy way to do this is by using the sites CopyGator or Copyscape to check for the originality of your content and any potential duplicates.
  2. Ask: Once you have found scrappers who have copied your material [note: the content duplication should be significant and their reasons should be to represent your content as their own, not to promote yours] email the owner or comment on the blog/site where the duplicate is found. In most cases the scraper will take it down and apologize for misrepresenting the work. Always try this first so that the blogosphere can stay friendly and young bloggers who might be making an innocent mistake will learn without being accosted.
  3. Block: The next step if they are unresponsive or belligerent to your requests is to use .htaccess to block the scrappers from your blog. This can be a little bit tricky for anyone who has never done this before, but here is a great link to learn how to stop scrapers [item #9]. Basically you are blocking the access of the scrappers from receiving your blog and rss feed.
  4. Take Action: At this point you have been nice, notified them of their misdeed, blocked their access and still the content is ripped off and on their site. The next way to get your content off of their site is by contacting the site’s ISP or hosting. The easiest way to find that out is by using Who.is and just inputting the site’s web address into their search bar. The hosting information will then show up with the rest of the site’s information. Once you have the host information contact them with a formal letter or email specifically claiming what and where the content originated and where it has been reproduced. The host will then quickly take down the content and offer the site owner a chance to explain themselves. Warning, this is serious for everyone involved so do not use this lightly. If this does not work there is yet one more option. This is legal action. Filed suits can be taken up depending on the scrapper’s home country and legal system.

Stopping scrapers and spammers will not only protect your work but also encourage the internet to be a better place. Every time a spammer is thwarted, other bloggers win too. So be an internet community builder by taking the proper steps to stop content thieves.

Seth Waite is Editor at Blogussion.com and enjoys helping every blogger reach their blogging goals. to contact Seth directly, just find him on Twitter @Seth1492

What’s Your Approach

From Darren: as mentioned in the introduction to this post – there are many stances that bloggers take on these issues, particularly when it comes to scrapers. Many take a similar line to Seth while others are more lenient and take the approach that as long as someone’s reading their content somewhere that it doesn’t worry them. What do you do? What tools do you use?

About Darren Rowse
Darren Rowse is the founder and editor of ProBlogger Blog Tips and Digital Photography School. Learn more about him here and connect with him on Twitter, Facebook and LinkedIn.
Comments
  1. Thanks for linking to WPShout! You say it’s quite a hard thing to do, but follow the instructions and you’ll be fine! Any problems feel free to leave a comment on WPShout :)

  2. Thanks so much for this!

    This is an important topic. I have been plagued by scrapers from time and at first I had no idea what to do.

    The first time that it happened I remember being really upset. I wish that I’d had this post, with its practical steps, handy back then.

  3. So what about the spammers that get around Captcha? They leave things like “Internet Marketing” as their name and link to their crappy site, but actually leave a semi-relevant comment?

    You know, it’s barely on subject, but just enough not to want to flag or delete it.

  4. Yeah, it’s a shame we have to resort to things like this, but it’s true, you have to have “blog protection” in place or you will find your content without your name on it. Thanks for the reminder.

  5. I have had content stolen from me several times but have never had to go past step one. Thanks for the additional information in case I ever have to go to the next step.

  6. I really hat captcha as it is a comment obstruction . I mostly rely on akismet and two more plugins and that helps me the most. Else I find it hard to check my spam messages to pull out genuine messages….
    More over blocking IP via .htaccess is something which I will not suggest as you might block and entire network ….

  7. We’ve found the WP plugin “Cookies for Comments” a nice compliment to Akismet. Stops the bots from posting in the first place.

  8. Great info!

    This is a problem that I have suffered with in the past!

    Not had a problem recently though!

  9. At the moment i am relying on two tools. Akismet and me. Akismet is great tool eliminate spams, however it can’t do all the job for you. The rest relies on me. Going through each comment and check if it relates to topiv or not. I manually check the website (suitable or not for linkback) that is provided as link in the comment.

  10. I’ve had several run-ins with scapers and splogs and one in particular with whom I’ve yet to reach a resolution. That person is basically passing off my posts as his/her own.

    Some additional tools i use are:

    Google Alerts (http://www.google.com/alerts)

    FairShare (http://www.fairshare.cc/fairshare/) alerts you when a certain percentage of your content is being reused

    tracer (http://tracer.tynt.com/) which embeds a message and link to Creative Commons license when someone pastes text copied from your site

  11. As to Darren’s footnote, scrapers hurt bloggers tremendously, often ranking about original content in search engines for inexplicable reasons. So I do what I can to fight being scraped.

    The problem is that

    1) most scrapers are automated rather than some kid copying your posts manually;

    2) most therefore do not post contact information;

    3) most are hosted on overseas servers, making them immune from US DMCA takedown notices.

  12. Good info to know.

  13. I’ve always been pretty lenient with this stuff. My primary strategy has always been cross linking within the content of my posts, which is now made even easier with any of a handful of plugins that will automate this for your WordPress blog.

    As long as the content links back to me, I really don’t care what someone does with it, and most of even the hardcore scrape and spam guys are too lazy to take the links out.

    I am looking at the solutions mentioned in this article to investigate a more proactive stance toward identifying and targeting the really bad stuff that doesn’t even give the credit.

    Ultimately though I suspect that the time spent chasing these guys down and getting your content back from them might be more profitably spent creating more content.

  14. Hey Seth.

    Good to see you here. For those who don’t know, Seth is also one of the many knowledgable people who you may run into in the problogger.com forums.

    I saw the first scraper of my material a few days ago, so I got a couple of these steps down, but it is good to know the whole order of what to do. I will try the .htaccess method as there is no contact form on their site.

    Thanks for the information.

  15. I have had a couple of cases where a homepage on someone’s “blog” used my headline and excerpt, but when I clicked the “read more” it linked over to my blog.

    I honestly wasn’t really sure how to handle this. They didn’t steal all my work, but they did steal the headline and copy word for word. I just blocked the trackback and moved on with life.

    Could this hurt me with the “duplicate content penalty?”

    Regarding using Captcha, I think it is a horrible inconvience to your readers. It seems like half the time, even if you enter the write code, you “fail” which leads to losing your comment. I have stopped commenting and/or reading various blogs because of this.

    I have seen others that are simple like “what is the sum of 2 and 2” which is much more reasonable.

    I also think Disqus is an great method to curb spammers.

    When it is all said and done, I think monitoring your comments is the best way to stop spam (mixed with Akismet). You should be proactive about reading your comments anyway, so having spam up for a couple of hours or having to delete 3-5 comments a day isn’t really a huge deal.

  16. @Thomas – Tracer sounds like a really awesome deal. Do you know how reliable it is?

  17. I get a stack of scrapers on totalapps.net they generally don’t take images but just copy and paste the text into their own blog. The odd person takes an image.

    Do they really gain from doing this ? I just assumed it was a fact of live and never bothered chasing it up, perhaps I should ?

  18. Michael Gray had a related post, but with a positive spin towards looking at scraper sites as link building opportunities. He suggested to use the RSS Footer WordPress plugin by Joost de Valk. An interesting take,
    http://www.wolf-howl.com/seo/use-scrapers-to-build-links/

  19. I recently installed captcha on my blog to try and keep out the auto-joiners who would then progress on to either comment spam or post their email spam coding into my home page and link to it. Since I installed captcha – no more problems. Great post tho, will take a look at the scrapers section.

  20. I use a plugin called AntiLeech. It doesn’t work with all scrapers but it does work for most of them.

    Basically, you tell the plugin what IP is stealing and reproducing your content. Next time this IP comes to steal more content, the plugin will feed it with a text you have previously defined.

    For example, my text to display on scrapers would be something like: “this article was stolen from http://www.iphonedownloadblog.com/ Stop reading this site. It steals content from other blogs”.

    Sometimes, it takes months for the scraper to realize it and during this time, you have fed his site with your links all over the place.

  21. I use Akismet for spam, and it really does a good job. Rarely does anythng sneak through.

  22. Scraping is a big problem with blogs especially technology blogs which are quite large in number.

    I am facing this scraping issues a lot nowadays…
    Also sometimes there are people who copy the entire post from my blog and give a small credit “Source” with a follow link ….

    Well thats good enough…. But I have seen that many of these scraped posts rank higher than my original post… this is what pains me as it hurts my traffic…..

    Sadly becoz of Google gives more importance to pages with backlinks and stuff…. even if its copied

    So what I do is to embed as many Hidden links back to my site as possible… For example a 1px*1px image linking to my hompage and some of the fullstops [.] linking back to the original posts.

    This technique sound crazy… but I found it does fetches you some backlinks ….

    :)

  23. I think the level of scraping is an important element to evaluate. If it is one article and they are a small blog you might want to shoot a quick email or just forget about it all together.

    If you find your content is being completely scraped then more serious action should be taken because valuable readers are being taken from your efforts.

  24. My tattoo blog content was regularly getting stolen. I installed the WordPress plugins AntiLeech and Simple Trackback Validation. Those plus Spam Karma 2 seem to have put a stop to spammers and scrappers.

  25. I like your story, but contacting the ISP doesn’t always work as described. I had someone steal my original work (a story on electronic signatures) and tried to work with GoDaddy to resolve. Their legal dept had 6 requirements to submit a claim and each time I filed they said a different requirement was either not met or not written to their unpublished standards. I would re-write the section to their new needs and they would come back that a different section was now in error.

    It was just a game they played to make it impossible to file legitimate claims. Eventually I gave up, frustrated and committed to never using their services. The spam/scrapper site is still running my article without crediting the source.

  26. Akismet is really amazing. As my blog gets bigger, the more spam comments I get. Akismet saves me several minutes every day.

    At least there’s a bright side to seeing spam: if you don’t get spam, you’re a nobody. Keep plugging away until spammers find you, then you know you’re well on your way! :)

  27. I use Aksimet, and I have an account at Tynt. Tynt generates a link back to your site when someone copy/pastes your content.

  28. For me, we must balanced up with spam controlling actions and reader-friendly element while considering how to control the spam from invading our blogs’ comment section.

    I do believe that captcha is not the best solution.

    It is because it will increase the amount taken by our blog readers to leave comment especially if the captcha is really hard to be read. This can be really discouraging.

    As the one who are asking others to do a favour by leaving comment in our blog, I think we should make it as easy and swift as it is possible.

    Along with Akismet, we as the bloggers should also spend our own time (not plugins or other software) to control the spam.

    Our readers have spent their time to leave their comments on our blog. Don’t you think it is fair for us to reward them by make their experience more ‘reader-friendly’? Why we can’t same at least the same amount of time to monitor our comment section?

    For me, there is always two sides of a coin that should be accounted for and think of.

    If not, for me, we are just a selfish blogger.

    What do you think, Seth?

  29. Darren,

    It is much worse that you think. I’m in Asia and let me tell you an ugly secret. Your writings have been plagiarized, copied and reproduced in foreign languages without your prior knowledge and permission. Credits were not given to you at all!

    You should visit India, Indonesia and China more often.

  30. I don’t understand spammers..that’s such a short term thinking, really useless what they do.
    Anyway i try to protect my work and blogs with Akismet and Captcha, as you already mentioned.
    I always block spammers on Twitter, without a question…i noticed we can finally retweet your valuable posts..
    Thanks.

  31. It hasn’t happened to me…yet….but if it were I would visit the site and asked them to remove it of block quote it and link back to my blog.

    Why do folks have to be so gosh darn lazy? :)

  32. Great post Seth. I agree with above commenter, best use is Akismet with occasional review.

  33. A good way to avoid scrapping is to serve only partial post feeds. If the content is good enough the RSS feed readers will visit the site and read the article.

  34. I agree that captcha’s are annoying… logging in to a site in order to comment is even more annoying… but it depends on the blogger I guess.

    I prefer a close monitoring of the comments because then I can read them and respond if necessary. Akismet is the best way to deal with spam.

  35. We’ve had our site scraped and my nice emails were ignored. The content was finally removed when I sent cease and desist letters, citing the US Digital Copyright Millennium Act (DCMA) to the web host and the ad networks used on the persons site.

    Once you find who is hosting the site using whois.com, a decent web host will have explicit instructions on how to file a copyright complaint.)

    I figured if they’re making money on my content, they wouldn’t do anything until their host or revenue was shutdown. I’ve heard that Google will ban someone from AdSense if they steal content (violating the terms of service).

    Akimset works well for us and you do have to scan it every occasionally.

  36. Dries Buytaert says: 10/20/2009 at 7:41 am

    Has any of you used Mollom (vs Akismet)? If so, I’d love to get your feedback on that. (Disclaimer: I’m a co-founder of Mollom.)

  37. Luckily I have never had my content stolen and hopefully I wont but if I do I will know wat to do!

  38. Nice information will surely help on my various websites with comment boxes and such :)

    Thanks alot i shall be reading alot of your blog posts from now on ^.^

    Thanks, David Macaulay
    http://threerelics.com/

  39. Bad behavior (http://www.bad-behavior.ioerror.us/) has been key in helping eliminate spam. Get a Project honeypot API key (http://www.projecthoneypot.org/), and it also works with bad behavior to prevent spam. This way you don’t need to use one of those nasty captcha scripts.

  40. I’m not too bothered by auto scrapers that leave your brand and links intact, imo these help your site. One such scraper drives a good 2-300 hits a day so he must be doing something right!

    I think people should ask whether the scraper is harming your site or if this is about a “principle”. I think the former is worth fighting for, but if its the latter – you may well have better things to do with your time.

    I’d be a little bothered if someone built a site around my content, but I suspect it would be a waste of time as the big G will know where it saw the content first.

  41. Reviewing some of the spam comments we get is always a treat. I’m tempted at times to approve some of the more creative ones.

  42. I use akismet to stop spammer because i do not fell comfortable with captcha. I think many readers feel that captcha is annoying.

    And for scrappers, it is more difficult to deal. There are really many scrappers out there which use auto blog. I think that block their access is the best way but it need complicated way to block them trough htaccess. We may can use block ip plugin for it.

  43. I had to deal with this problem unfortunately quite a few times and while being nice works…. sometimes…. I found that those who steal your content in order to profit understand only one thing – hit them where it hurts, source of their income.

    Filing DMCA complain with Google or Yahoo, 2 commonly used Ad networks on those blogs does wonders.

    Alex

  44. An interesting and useful post, and thanks to the folks who have brought up Tynt, thanks!

    I wanted to point out that although the people who steal your stuff are the really annoying ones, a lot of your content is leaving your site from your fans. We don’t promote ourselves as an anti-plagiarism tool (although if we can help that way then I am happy) but rather as a way to try to encourage anyone who is distributing content from your site to properly link back to the source content.

    A couple of interesting notes. Of the over 5 billion page views we are monitoring every month, we are tracking over 100 million unique copy events. We find that 2-6% of all page views result in a copy. If we extrapolate Compete’s traffic information for the ProBlogger.net domain for example, we would expect to see ProBlogger readers copying content over 40,000 times every month! Obviously most of these aren’t scrapers, but ProBlogger fans who are liking what they read and spreading the word via email, Facebook, and other means.

  45. I have had to send emails quite a few times to bloggers who have re-posted my articles, word for word, without permission or even a reference to me or my site. It seems to be common that new bloggers cut and paste without thinking about how it affects the original writers. Each time I’ve contacted a blogger, I not only ask them to remove the content, but I also include concrete examples of the appropriate ways to link and refer, so I feel like I’m helping and not just demanding.

  46. Thanks for copygator! Will try. But who will steal my content anyway :)

  47. I dunno, when I see spam on my blog I cry and weep and wail and gnash my teeth. I don’t think it’s any less effective that what you’re proposing.

  48. Some good pointers. Captcha’s are a good one to use. They will at least block all the bots.

  49. Captcha has been problematic for me, so I just monitor my comments and moderate the ones older than 14 days. But I don’t get that much spam so I must not be as popular .

    I have had content from my entire site scraped. The scraper left no contact info and ignored all polite requests. Fortunately, they set up on Blogger so a DMCA report to Google was quick and easy. Google took them down in a day.

    Regarding blocking, the .htaccess trick seems like a cool technical thing to do. One other suggestion is, if they are still linking to images on your host, to change the image but keep the same filename. The new image can have a message like “this site is stealing content from so-and-so.”

    Very useful post!

  50. I agree with Thomas on fairshare – it’s free and all you need to do is submit your feed and to get results. I love fairshare because it highlights the passages that were copied.

A Practical Podcast… to Help You Build a Better Blog

The ProBlogger Podcast

A Practical Podcast…

Close
Open