Facebook Pixel
Join our Facebook Community

Stop Scrapers and Spammers Fast

Posted By Darren Rowse 20th of October 2009 Miscellaneous Blog Tips 0 Comments

One of the challenges that bloggers face is what to do when others want to use your blog for their own gain by either taking your content or spamming your comments section. The more I talk to bloggers about how they deal with these issues the more I realize how many different approaches there are to the problems. Today Seth Waite from Blogussion shares his approach. I’d love to hear your approach (whether it be different or the same in comments below).

Every blogger quickly learns the reality of hard work in blogging. After the “make money fast” hype has wore off and the reality that blogging is a great way to earn an income if you work for it has set in, you are left with a choice?

The choice is whether to stay in blogging or not. Many bloggers decide to stay but are again left with another extremely important decision. Should I put the effort into become a great blogger or just try to still do things the easy way and hope things will be different for me?

Those choosing to work hard begin the process of learning and eventually find success by learning, networking and earning their way to better blogging. Bloggers who are unwilling to face reality either quite or eventually become spammers, scrapers, or beggars.

I am not going to address the problem of bloggers who beg for help without working for it, but I do want to talk about spammers and scrapers. Most importantly, I want every hard working blogger to know how to stop selfish bloggers trying to use your work disrespectfully to help them.

Stopping Spam

The easiest way to stop spammers who are trying to get you to link to their blog/site is by controlling your comments and trackbacks. Although essential to building a great blog community, comments must be moderated to ensure your actual readers feel comfortable with the discussions on your blog.

Captcha

Commenting at first was easily controlled by forcing commentators to put their email address into the comment form. Spammers quickly got around this and now a very easy way to stop spammers is by adding a captcha feature to your blog comments.

Captcha is already used by Blogger and easily adds to WordPress and other blogging platforms with plugins. The way it works is that you put in a series of numbers or letters from a visual image in order to post your comment. Other systems require you to add the numbers or fill in the form based on another easy question. Using captcha is a quick and easy way to minimize your blog’s spam, but it may also be annoying to regular readers.

Plug-ins

For many blog platforms, like WordPress, a simple plug-in will solve many of the spam problems. The most common spam blocker is Akismet, which is now available for over 20 other blogging platforms besides WordPress. Using this plug-in on your blog is simple and requires you to only check to make sure occasional comments are not being counted as spam. In addition to the normal comment protection it provides, it goes above and beyond captchas by protecting your blog against unwanted trackbacks.

Stopping Scrappers

Scrapers are bloggers who steal content you produced and put the entire work on their own blogs and websites. The practice sadly is common and creates reproductions of your content around the web. Luckily most search engines are good at recognizing the original content, but scrapping is illegal and damaging to the blogger and blogging.

  1. Identify: The first step to stopping scrappers is by identifying your content and checking for copies. An easy way to do this is by using the sites CopyGator or Copyscape to check for the originality of your content and any potential duplicates.
  2. Ask: Once you have found scrappers who have copied your material [note: the content duplication should be significant and their reasons should be to represent your content as their own, not to promote yours] email the owner or comment on the blog/site where the duplicate is found. In most cases the scraper will take it down and apologize for misrepresenting the work. Always try this first so that the blogosphere can stay friendly and young bloggers who might be making an innocent mistake will learn without being accosted.
  3. Block: The next step if they are unresponsive or belligerent to your requests is to use .htaccess to block the scrappers from your blog. This can be a little bit tricky for anyone who has never done this before, but here is a great link to learn how to stop scrapers [item #9]. Basically you are blocking the access of the scrappers from receiving your blog and rss feed.
  4. Take Action: At this point you have been nice, notified them of their misdeed, blocked their access and still the content is ripped off and on their site. The next way to get your content off of their site is by contacting the site’s ISP or hosting. The easiest way to find that out is by using Who.is and just inputting the site’s web address into their search bar. The hosting information will then show up with the rest of the site’s information. Once you have the host information contact them with a formal letter or email specifically claiming what and where the content originated and where it has been reproduced. The host will then quickly take down the content and offer the site owner a chance to explain themselves. Warning, this is serious for everyone involved so do not use this lightly. If this does not work there is yet one more option. This is legal action. Filed suits can be taken up depending on the scrapper’s home country and legal system.

Stopping scrapers and spammers will not only protect your work but also encourage the internet to be a better place. Every time a spammer is thwarted, other bloggers win too. So be an internet community builder by taking the proper steps to stop content thieves.

Seth Waite is Editor at Blogussion.com and enjoys helping every blogger reach their blogging goals. to contact Seth directly, just find him on Twitter @Seth1492

What’s Your Approach

From Darren: as mentioned in the introduction to this post – there are many stances that bloggers take on these issues, particularly when it comes to scrapers. Many take a similar line to Seth while others are more lenient and take the approach that as long as someone’s reading their content somewhere that it doesn’t worry them. What do you do? What tools do you use?

About Darren Rowse
Darren Rowse is the founder and editor of ProBlogger Blog Tips and Digital Photography School. Learn more about him here and connect with him on Twitter, Facebook and LinkedIn.
Comments
  1. Hey Darren, you need an editor for these things. The title is right, but “scrapper” is used throughout the article instead of “scraper.”

    The stopping spam tips are nothing new at all. He doesn’t even cover what you’re doing now – closing comments after a given period of time. There are so many techniques to use to combat spam that there have got to be more plugins for WP to implement them. (e.g., using CSS to hide honeypot fields, profiling a comment and creating a spam score based on other criteria such as mouse movement, keypress, referrer, etc. – each of which should not be used as a criteria alone to reject a comment, but together can be used to at least flag it for moderation.

  2. Akismet is good to stop spamming but sometimes it goes down to stop spammers.
    I am facing the prob.
    But the copygator can do many things for us.

  3. You know for me blogging is more about having fun and meeting terrific people than it is about spamming, trolling and scrapping. It’s sad that such an issue as to even be an issue. Get it done the right way or don’t do it at all.

  4. Can spammers and auto-scrapping bots get past Captcha – I would not think so. Most people are now aware of Captcha and although it may be a bit of a pain, they realize the importance of it.

  5. Some really useful info here, I have implemented these tips to stop scrapers on my website and am now waiting to see what gets caught in my spider trap. I am a little confused about how to do the IP blacklist for Apache. Tried to follow instructions on honeypot.org, but got a little lost on that, i found the module for Apache, but am not sure how to go about it. Hopefully the honeypot and htaccess rules will help for now.

  6. Why would you want to stop scrapers? Is what you say not important enough to warrant additional distribution? Are you on so much of an ego trip that the important thing is people know the source of your ideas, rather than being exposed to the ideas themselves? Are you too stupid to outrank a scraper site? Are you too stupid to MONETIZE being scraped? You know, if the scraper takes the post wholesale, then affiliate links are intact. Link to yourself with good anchor text, and you have a scraper adding perfectly anchored links to you.

    I love havnig my sites scraped. It’s just one example of how there’s a conspiracy to make me successful.

    Here’s the deal, if you REALLY are offended by people taking your work and distributing it for you, stay off the internet. Apparently you aren’t expressing any ideas that are worthy of widespread attention anyway, since you want to limit your distribution.

  7. As a matter of fact, you can not avoid all of the spams, so I think Akismet is enough for WordPress, the more plugins you used, the worse your readers will feel. For the scrappers, I will fight with them whatever I can do if they really angry me.

  8. Good article, when ever you find duplicate content just intimate it to google then they ll ban the site form google search engine.

  9. Thanks a lot Darren for this useful post.

    I must say that I truly enjoy reading your blog as I am starting out blogging myself and have found many of your postings truly useful.

    Even postings as far back at 2007 or 2008 are still relevant today.

    Thanks for providing all the insight!

  10. I’m not worry about comments, as using akismet and a bit of time to moderate do the work.

    My main concern is when it comes to copying my content, that’s why I found first point pretty useful.

  11. Didn’t know about “blocking”. That’s good advice. I’m surprised Google hasn’t come up with a better system to prevent scraping. Its a major pain to serious bloggers.

  12. Thanks for yours information you have given step by step points to stop spamming and scraping.like
    Captcha
    Plug-ins
    Identify
    Ask
    Block
    Take Action
    As a blogger every one is getting disturbed from this spammer and scrapper they steal your stuff.most of the spammer using social networking sites for the spamming
    i think your information may stop spamming.
    thanks

    .

  13. I use Fair Share which sends me a report and a link to the scrapper site via feeds. I have had two unattributed copies – in both cases by beginner bloggers who didn’t know better. But most scrapper blogs take just a portion of my post but they do add the links.

    I don’t understand why they would even take some of my jewelry making posts for a pseudo gardening or even construction site!!

    I also place a link signature at the end of each post.

  14. This is a timely subject for me. Akismet catches spam on my blog. I’ve been blogging nearly 2 years, but I didn’t realize until a couple of weeks ago that I need to check the spam regularly to despam comments that shouldn’t have ended up there. A big “Duh!” I know. I’ve blacklisted dozens of IP addresses, porn words, and maybe two thirds of the pharmaceutical drug names known to mankind, so my spam filter is fairly tight as a result. I simply need to check regularly for those good comments, since I get so few.

  15. Scraping is the most disruptive of the 2 problems. Spam is now becoming more and more easy to stop, especially with WordPress plug-ins.

    Scraping is also better, but the challenge is when overseas bloggers take content.

    Most importantly… if you take the easy way out, you will never succeed in any venture.

  16. Spammer are going headache for blogger like us. I really feel bad when someone just put “great post” on comment. I now put disqus comment form on my blog and moderate comment before they publish. I think every blogger should have moderate comment before they are published.

  17. First, thanks for the post and the links to Copyscape and CopyGator. I’ve only had this happen to me one time (that I know of), but those sites will be a great resource to make sure it doesn’t happen again. The one time it happened was an interesting lesson.

    I have Google alerts set to anything related to “film music,” “film scores,” my website or my name. Within minutes of my post appearing on another site (I think it had more to do with the timing in relation to Google’s alert schedule, i.e., mid-day), I got a notice and was able to see that the entire post had been copied verbatim.

    Now, I WAS given credit as the author of the original post, but it had been copied and pasted directly on this other site. I emailed the site owner, thanked him for his kind words about the post, but asked him to provide a direct link back to my site. This was for two reasons: 1) for the traffic it would drive to my site, and 2) to alleviate any possible ramifications with Google rankings (I’m not sure if there are any, but I don’t want to take any chances). He did it without any problems and the whole transaction was very smooth. I realize this isn’t exactly a scraper or spammer per se, but it does show that sometimes innocent mistakes are made and can be resolved very easily.

    Hopefully that’s the worst that’ll ever happen with that. :)

  18. I am also facing Commenting Problem in my blog.Allot of members are coming to post there Links and nothing.
    They don’t share any thing related to the post.But they are just leaving there links.

  19. nice article
    i have previously tried to contact google and report scrappers but got no response at all , even when they were copying whole documents without changing anything in them

  20. Most scrapers simply copy your whole post (links and all), or grab your rss feed. If you make sure you link back to your blog in the body, or even at the end of the article then many readers will find their way back to you (people love to click).

    Also if you use affiliate links, or you’re promoting your own product, then the scrapers are helping you:-) I’ve made several sales where the clicks came through from a scrapers site.

    If you can’t stop them, just make sure when they steal your content it benefits you in the end. Beat them at their own game.

  21. Block the scum. I use my cPanel and block them from my entire site. Saves time since I have 6 blogs and adding the IP to all of the .htaccess files takes more time. I have reported them to their provider, but I doubt that does any good. Oh, and it’s a no-tolerance policy on my blogs. Spam it once, and you’re done.

  22. You have to take content moochers on head immediately. I have a zero tolerance policy for this.

    Commenting is a big problem for me lately. I use Akismet on my personal WP blog, and approve comments on client Typepad accounts now, because of the massive amounts of spam and troll-like activity. Some are spam (especially on some client blogs), but others are spam without the poster realizing it.

    For example, authors leave comments at my personal blog that are nothing but the summary of their book and a link to buy on Amazon. There is no contribution to discussion; just an advertisement. Some authors I’ve talked with don’t realize this is really spam. I delete these. A few publishers have left comments about reviewing their books (usually vanity presses, but not always). If someone leaves a comment and says specifically that they could not find my contact information, then I’m a little more lenient, depending on the request.

    What is most frustrating about comments are publications that don’t track them – especially newspapers. There are obvious spammers who use the same handles at different sites and copy and paste the same exact comments at those publications. Yet, news publications do nothing to stop this. These are also the folks who usually attack anyone who leaves an opposing comment – no matter how nice that opposition is in the writing.

    It is also frustrating when sites allow trolls to take over. I’ve since stopped reading a few blogs and other sites because of the lack of troll control. I think it is disrespectful to readers to allow that to continue.

  23. There’s one technique that I was surprised not to see here: reporting scrapers to Google AdSense. The majority of the times I’ve found sites running my content without attribution, they had Google AdSense running in the sidebar. These scrapers are trying to make money, so hitting them there hurts a lot more.

    AdSense even has an online form where you can report scrapers. (https://www.google.com/adsense/support/bin/request.py?contact_type=dmca_complaint).

  24. Aksimet is the best. I am glad you don’t use Captcha. I understand its purpose, but I really hope that blogging does not come to that.

  25. As a relative newbie to blogging, I am amazed at the length some spammers will go to post garbage on my blog. Multiple names, email addresses and such. I often wonder why they spend valuable time trolling the internet when they could be doing legitimate business elsewhere. I edit or delete suspicious submissions and I often check who.is for more data.
    As to the topic of scraping, this is news to me and I will go read up on that one. It sounds pretty sad that some people do such things especially when there are other creative options one can explore… I add a copyright mark and my name/signature to my posts and will explore some of the excellent suggestions made here. Who knew?
    Cheers,
    E

  26. If the scraper has Google Ads on his site, you can get Google to suspend his account for stealing your content.

    Just click on the ‘Ads by Google’ icon on their site.
    Click on the “Send Google Your Thoughts” link
    There you can check the violation checkbox that says “The site is hosting my copyrighted content”

  27. I use akismet plugin and it’s really helpful,though im still open for new ideas.

  28. I use Akismet for comment spam, and I’m very happy with its effectiveness. It has filtered out all spam comments quite effectively.

    My site gets scraped regularly. If I have time, I run Copyscape checks. I hate that I have to do this policing. I’d much rather be blogging or almost anything else. I’m not a famous blogger, but I can imagine that your scraping problems are astronomical if a small blog like mine gets scraped so often.

    Thanks for the info about blocking .htaccess. I wasn’t aware that could be done.

  29. Before reading this article I though akismet is not good for spam blocking but thanks for sharing this information with us and from now onward I use that plug in.

    Great article..

    Alam

  30. You can NEVER go wrong with unique content obviously. It’s something the net, and Google are dying to index, yet few have. On the other hand, I don’t mind re-publishing of articles if the original author gets their credit and they put the article somewhere decent on their site.

  31. Thanks for sharing this information. I recently encountered a situation where a company scrapes content from various sites and uses it to put up a “directory” of a city. They charge others $1000 to advertise on this “directory.” In some posts, they put a link to the original site. In others, they don’t.

    I asked for my content to be taken down, pointing out the illegality of using my copyrighted content without permission. They obliged.

    It still infuriates me that they are profiting from plagiarized content. They even have the nerve to put their own privacy policy and copyright notice on their site.

  32. My good friend Si Dawson just designed a great Twitter app for getting rid of spammers on Twitter. It’s called Twit Cleaner and gives you a very detailed report of people you need to block or unfollow etc. It’s such an awesome spam solution. http://www.twitcleaner.com :-)

    I think there’s a growing trend of people that want to rid their blogs & social media sites of the spammers once and for all.

    Cheers,
    Sarah ;)

  33. Tips: the fastest way to block these spammers is “htaccess”.
    If you know how to use them…

  34. There is a fundamental problem with CAPTCHA, that it is in most cases not accessible for people with disabilities. There are certain solutions which might work, but in general a person who is blind cannot enter text from an image, but with their screen reader they are able to solve a challenge question. There is something that’s called an audio CAPTCHA which they can use, but it is still not helping deaf blind people. CAPTCHA in most of its forms is also creating a challenge for people with cognitive disabilities.

    It is not to say don’t use verification, but use it wisely, put yourself into the shoes of people with different disabilities.

  35. Capcha features make commenting impossible for blind readers. I use capcha on my blog, but I’d love an alternative. It’s just not accessible.

  36. This was an excellent article, and I love when I see people blogging about it.

    It is so important to add content when blogging, time and time again, i’ve seen people put a “Great Article” with their link after it.

    They added nothing!!

    Nothing at all!!

    My favorite ways to prevent this is to use good old fashioned “Moderate” each comment, and “Capcha”, both should keep the spammers out. I’m wondering what you’ll have as I push the “submit” button.

    Anyways, powerful article on scrappers as well. I haven’t seen a lot of duplication, but I have seen tools that take blogs, articles, etc, and spin them into a completely different article, just saying it in a different word.

    Not that I do this, but the tool operated by reading a few words at a time, and changing the order of the words around, and you chose. So you can spin like 10 or 20 articles from just 1.

    Subject at hand is not to Scrape, period. It’s wrong. But your advice is great. Contact the person first, to get their side of the story.

    See you on the net,

    William Whitlow
    http://www.williamwhitlow.com

  37. I use Mollom, a great module which can be installed on all major blogging platforms (Drupal, WordPress, Joomla). It has some very innovative features like:
    – multilingual support for identifying spam;
    – CAPTCHAs & text-analysis filters which really work;
    – “crowdsourcing”. Basically all sites protected by Mollom can report comment spam that slipped through the cracks. Mollom combines and correlates this information and learns from it to help prevent future abuse.

    Been using it with great success on http://www.7tutorials.com
    Now I no longer need to use comments moderation. I only get 1-2 spam messages slipping/month which can be easily deleted.

  38. I felt your effort of posting such useful links for us. Extremely well done and your choice of language makes them even more interesting. An occasional bit of irony or an elegant twist in the phrase is a welcome relief.

  39. spammers are always annoying for me…but i have got rid from it by some wordpress plugins

  40. Thansk for the Great article! It’s perfect for those of us that are still learning to be effective bloggers.
    Fighting spam sucks and I’m thankful that WP offers the necessary plug ins.
    I would have to agree partly with some of the comments above that scrapers can actually work in your favor.
    To me, it is simply a way to spread your message and as long as you link your article correct, it builds external links for you.
    Do I understand this correct or am I missing something?
    Hmmm?

  41. Thanks for some great tips. I love Akismet, and really love the “Delete All Spam” button. Boom! All gone!

    I didn’t know you could block it with the .htaccess file, so that’s a great tip. Haven’t had to use it (yet), but thanks for that tool to put in my toolbox!

  42. Thank you, Seth. This was both frightening and instructive. I feel older and wiser for your words. Best regards, P. :)

  43. Excellent info. I ran into a situation a couple months ago where someone scraped and entire post of mine and put it on their site, used exactly the same title and content, and then sent out a Tweet about it — using exactly the Same Title in the Tweet. Rather infuriating. Your post is very helpful for addressing this. Thanks.

  44. Aren’t spammer looking for do follow blogs only? If I make my blog no follow, will it reduce comment spam? And isn’t wordpress blog no follow by default?

  45. Crazy blogger: WordPress is Nofollow by default, but going nofollow does not mean no comment spam. Most of my blogs are nofollow, but still receive a disgusting amount of comment spam attempts, which I suspect are mostly automated. Akismet and manual approval is still the best way to go!

  46. Before quitting I must say that you should start blogging as your hobby instead of earning money. Because if the aim of your blogging will be money than its for sure that you will get bore after three or four months when money will not come.

    So in the initial period simply enjoy blogging and after establishing good readership you can try to earn money.

  47. Thanks for the additional information in case I ever have to go to the next step.

  48. I have my comments set so that I have to approve every first time commenter. After that, everyone who has been approved can comment freely.

    I use Akismet. I used to use Bad Behavior but turned it off because I was having a problem with MSN not indexing my site (when it was MSN).

    I haven’t seen my content turn up on other blogs yet except for the occasional blog who will post the title of my post and an excerpt with a link back to my blog. Still I try to interlink my posts and I use the WP Pluging copyright feed that puts a little notice at the bottom of my feed that states if the person is not reading the post in a feed reader then the blog they are reading is committing copyright theft. It’s not much but something is better than nothing.

  49. in the “link builders bible 2010”.

A Practical Podcast… to Help You Build a Better Blog

The ProBlogger Podcast

A Practical Podcast…

Close
Open