This is a guest contribution by Felipe Kurpiel, an internet marketer
I came across this topic by accident. One day I was monitoring my analytics data I noticed a big drop on my traffic stats and I didn’t understand why.
Actually, I had a hint because I was starting to interlink my posts. That gave me a clue that the problem was internal which I thought was a good thing. But that is not enough because then I had to analyze what Google is focusing on now.
If you have been involved with SEO at all you know that duplicate content is a bad thing. But how can you identify the duplicate content on your site?
Ok, let’s get started with that.
Identifying Internal Duplicate Content!
That is a little advanced because we are about the crawl our website the way Google does. That is the best way to analyze the source of any problems.
To do that I like to use a Free Tool called Screaming Frog SEO Spider. If you never used this tool it can be a little complicated but don’t let that scares you.
You just have to follow some steps. Actually you can analyze a lot of factors using this tool but for our example, we are just considering duplicate content.
First Step: Add your URL website into the software and let it run.
It can take a while depending on how big your website is, but after that we are ready to filter what we are looking for.
Second: Go to the Page Titles tab and then filter by Duplicate
If you are lucky you will not have any result showing when you choose this filter. But unfortunately that was not my case and I saw dozens of results which were the proof that my website had internal duplicate content.
Third Step: It’s time to analyze what is generating the problem
You can do this on Screaming Frog or you can export the file to Microsoft Excel (or similar) in order to deeply analyze what you have to do to solve the issue.
In my case, the duplicate content was being generated by comments. Weird, isn’t?
That is what I thought and I also noticed that the pages with comments were being flagged by Google because they disappeared from search results.
When that happens, you have no turning back but fix the source of the problem.
Understanding Comments
Every comment on my website was generating a variable named “?replytocom”.
You don’t need to understand exactly what this variable does but put it simple; it is like each comment you have on your posts has the ability to create a copy of this particular post in your site. It can be considered as a pagination problem. And that is terrible because when Google crawl your website it can see that your site has the same content being repeated over and over again.
Do you think you are going to rank with that blog post? Not a change!
How to solve this problem
More important than to identify this issue is to create a clear solution to get rid of this pagination issue.
In order to deal with this variable there are two solutions. The first is really simple but not so effective and the second can be seen as complicated but it’s really the ultimate solution.
But let’s cover the easy solution first.
I run my blog on WordPress and one of the few essential plugins I use for SEO is WP SEO by Yoast. If you are using this plugin you just have to go to the plugin dashboard and then click on Permalinks. Once you do that just check the box to “Remove ?replytocom variables”.
This is really simple but sometimes you won’t get the results you are expecting, however, if you are having this kind of problem with comments you MUST check this option.
Second Option
After that you can run your website URL using Screaming Frog to see if the problem was solved. Unfortunately this can take a while but if after one day or two you are still noticing problems for duplicated content you have to try the second option.
Now we just have to access Google Webmaster Tools and select our website.
Then under Configuration we must go to URL Parameters.
We will see a list of parameters being crawled by Google in addition, here we have the chance to tell Google what to do when a parameter in particular is affecting our website. That is really cool.
For this replytocom problem I just have to click Edit and use the following settings.
Click Save and you solved the problem!
Now if you tried the first option using the plugin, then you used Webmaster Tools to tell Google what to do with this parameter and after a few days you still see duplicate content, there is one more thing you can try!
Now I am talking about Robots.txt!
Don’t worry if you don’t have this file on your website, because you just have to create a txt file and upload it on the root of your domain. Nothing that complicated!
Once you have created this file you just have to add a command line in the file.
If your Robots.txt is blank, just add these commands there:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: *?replytocom
If you already had this file, just add the final line: “Disallow: *?replytocom”
It will for sure take care of everything!
Final Thoughts and Monitoring
The best way to avoid this or similar problems is monitoring your data. So here are my three tips to keep your website Google friendly.
- When working On-Page be careful with the settings you are using on Yoast WordPress SEO plugin. Don’t forget to review Titles & Metas tab and check the “no index, follow” option for every little thing that can be considered as duplicate content.
An example is the “Other” tab where you MUST check this “no index” option so your Author Archives will not be seen as duplicate content when Google crawls your site. Remember, you have to make your website good for users and for search engines.
- At least twice a week, analyze your traffic on Google Analytics. Go to Traffic Sources tab then Search Engine Optimization and keep an eye on Impressions.
You should also use an additional tool to track your keywords rankings so you can see if your search engine positions remain intact or if some of them are facing some drops. When that happens you will know it’s time to take some action.
- Every two weeks, use Screaming Frog to crawl your website. This can be really important to check if the changes you made on-site already had the impact you were expecting.
When it comes to duplicate content the most important tabs to monitor on Screaming Frog are Page Title and Meta Description. However, in order to have a website that can be considered Google friendly it’s vital to analyze the Response Codes as well and eliminate every Client Error (4xx) and Server Error (5xx) you identify when crawling it.
Felipe Kurpiel is an internet marketer passionate about SEO and affiliate marketing. On his blog there are great insights about how to rank your website, link building strategies and YouTube marketing.



 
							 
							 
							

Thank you very much for this post. I have never tracked that through screaming frog. When i did, i got 5 posts being duplicated by that replytocom thing and SEO for Yoast modification solved the issue!!
Hey Gjivan… screaming frog is a great tool. And let’s face it… people talk about a website that is Google friendly but only when you can crawl your site the way Google does, it’s when you have an idea how friendly your website is.
Thanks for the compliment!
Hi,
In my robots.txt file ‘”Sitemap: http://webfunspace.com/sitemap.xml.gz” this is also shown. is that ok??
Super smart tips Felipe! This underscores the need to create good, fresh and yep, unique content each time you create a video or post an article. Even if you use the same title – which can happen if you are prolific – create unique content in the post body and of course, link up to other sites to improve your SEO overall
I do create many video posts daily but always try to switch up the titles and topics. If I do create some dupes, I am at peace with that :), as long as I am churning out good content, using proper SEO and building my network, it is all good.
More than anything visit blogs daily, be active on your social networks and you will never lack for new, unique and fresh blog post ideas, helping you avoid Google’s duplication penalties.
Thanks for sharing!
Ryan
Hello Ryan!
Now I see how relationships with other bloggers can help with your own blog. When I first started I wasn’t aware of that and that was a big mistake.
Fortunately, it’s never too late to learn!
I am glad you like the post.
Thanks!
Is this a problem for Typepad sites?
Rita at The Survive and Thrive Boomer Guide
I am not sure about the platform used on Typepad, but if it’s running on WordPress it can be a problem. Unfortunately, I don’t know if you have all the flexibility of owning your own domain to fix that.
Thanks for posting Felipe, I never gave a thought to this kind of duplicate issues on my site, since all my content are written from the scratch by me.
But you have really given me a great insight into how to arrest duplicate issues on my site.
I will be sharing this with my network.
Well, unique content is great but you have to take care of the technical side as well, otherwise you will not see the benefits!
This is web 2.0 and we have to how a bit of everythingI
Thanks Felipe this is some really useful information and hopefully this should help me resolve some issues with my own site.
I tried to prepare a step by step here to simplify this topic. It can be seen as complicated but if you follow all the steps I described I am confident you will solve this duplicate content issue.
Great info Felipe – I use Yoast as well and will be double checking my settings after leaving you a comment here :) Does screaming frog cost anything? I will have to check the titles and metas too. Great info! Thanks for sharing with us here on this topic of duplicate content. No one wants Google to show their wrath on their blog or website.
Hey Lisa.. here we are!!
Well, I love the Yoast plugin – it is free and it is really great.
Screaming frog is an excellent tool but the interface it’s kind of intimidating on the other hand it’s a free tool. So there is no need to spend money to solve this!
Oh.. thanks fot the mention and RT!
You are great!
Hello Felipe Kurpiel,
this is an awesome informative post for me as well as for the newbie blogger’s. who just want to eliminate the duplicate content.
I’m in love with WordPress SEO yoast and i think this is the very great Tool.
i Think i should have a look at Screaming frog SEO Tool. i heard that it very powerful tool as well as free to use.
Thanks for sharing Felipe
Have a wonderful day Ahead
This is an excellent tip. I just downloaded Java runtime enviroment and Screaming Frog. Everything works like a charm. I am glad I didn’t find any instance of duplicate issue after running throught the filter pannel.
Thanks for the intresting piece.
duplicate content is really a big headache for every blogger and content writer. it’s important to analyze your blog time to time to avoid any unwanted penalties from Google.
thanks for sharing such nice tips.
Gaurav
FrankHouse
Thanks for the great tips. Its important to understand what’s causing the problem so that it doesn’t happen again. Sometimes just fixing the problem isn’t enough.
I didn’t know that Google will penalize you for interlinking the links and eventually have duplicate contents. Thanks for sharing this very useful information.
Great post! I followed your instructions, and it appears that tags are creating the duplicate content. Your post did not cover how to fix that. Avil
Hey Avil… you can fix that using the right settings on Yoast plugin.
Just spend some time and find the option related to indexing Tags then just uncheck that box.
I hope it helps.
Thank you Felipe!
Nice share Felipe…
Recently having heard about the replytocom’s, I have configured them via GWT but does replytocom url’s coming from other sites (as a backlink) harm our site in any way?
I didnt know about this issue, I have my own blog, Now I have to check these duplicate issues in my blog. Google can penalize my blog due to duplicate issues.
Thanks for the information, Felipe. I use WordPress and Yoast, so this could definitely be an issue for me.
Great job Felipe. It’s good you are bringing this to our notice. Unfortunately, a lot of people get punished without even knowing they are. Your write up is an eye opener and it would help us to take care of the duplicate content stuff.
Thanks for your help. Remain blessed.
Thanks for the info Felipe!
I have to check my blog now! Great info that I just heard first here!
Thank you for the heads up…
A very useful how-to guide, but I couldn’t stand reading on for too long as the writing style (which was stiff and hard to connect to) and grammar was awful!
Sorry Felipe, but you made the most basic of spelling and grammar errors very regularly. I can understand a few here and there, but this was certainly well over that.
I’m surprised that this wasn’t picked up when it was sent to Problogger for reviewing!
I’m not using wordpress blog right now but gather all useful wordpress related tutorials to apply them in my future blog.Now this tutorial also is in my collection.Thanks for sharing this useful post.
Thanks for the easy to follow tips! Followed them all, so hopefully my growing site won’t get hit with the duplicate content stick. Very clear, very newbie friendly. Well done!
I did option #1, which left me with 3 urls showing duplicates – my main domain and two archive pages. Went into Webmaster tools – your instructions say go into Configuration but on my screen it’s Crawl ->URL parameters.
The message there reads “Currently Googlebot isn’t experiencing problems with coverage of your site, so you don’t need to configure URL parameters. (Incorrectly configuring parameters can result in pages from your site being dropped from our index, so we don’t recommend you use this tool unless necessary.)” which makes me leary to change anything.
Should I trust the Webmaster saying all is well or the Screaming Frog and your tutorial here and alter the url parameters? Thanks,
Can i follow all three steps ? or should i stick to only one method ?
i have checked yoast settings, added parameter and even edited robots.txt
Is this fine? Will it create any further problems?
around 1600 links are indexed from my site. Need to solve them immediately
Thanks
Awesome Tutorial,
But i think you’ve missed out a significant tip. It’s good to noindex Category and tag archives as well because the same post excerpt will be repeated over and over.
Great post! WordPress can be an absolute nightmare in terms of duplicate content – it’s really annoying. I guess the good thing is that about 95% of those duplicate content issues can be cleared up by using a few plugins & code hacks.
One of the more annoying duplicate content issues I’ve come across in WordPress over recent times is the fact it creates multiple versions of the homepage, ie; yoursite.com/2/, yoursitecom/3/ etc. I still haven’t figure out how to stop it creating these.
Thanks again for a great post – I hope people take note!
I was never aware about screamingfrog tool. For me ‘Let Google Decide’ option was working fine so far. But I deiced to do the changes as you described because I think will be better for the future. is Thank you for the step by guide – Very useful post.
hey nice post Flip
Well its quite informative one and even helpful for the bloggers to make their blog more google friendly…scream frog is a nice tool to analyze this in your blog..and can help you out in handling these problems
Quite informative and helpful article
Thanks for sharing
I think you are chasing scarecrows here.
I am sure I recently watched a video by Matt Cutts who basically responded to a question on this very issue by saying that Google did NOT give any penalty for duplicate content on a single site.
Google is “smart” enough to know how this can easily happen given the structure of modern blogging platforms.
Duplicate content issues arise when your content duplicates something that was published on other sites.
Thanks for the tips & Link. Recently on of my blog was hit & I found a loss of 90% in organic traffic. Was searching the way how to find duplicate content & Ur post helped. Thanks.
Thanks for this great post! I found I had issues with replytocom’s as well. I had no idea. I also noticed that I have the same url appearing twice in the duplicate results. One with anchor text and one without. Is this something I should be worried about?
I am not so much aware on these type of issues. Maybe it’s for little knowledge on search engine penalized. Thanks for share, i will follow your instructions.
Even in internal linking we have to take full care of to do it in a way Google bot does not presume it as internal linking. Moreover, we place author box very casually after our each post and forget to tick no-follow option and it is considered the oft-repeated duplicate content. But all this we need to check through screaming frog before taking any action on it
Thanks
Awesome Tip for every bloggers.
Thanks for dupicate content. “ReplytoCom” makes creepier. I have to remove the via GWT.
But as far as i noticed, If i removed the Disqus Comment System, The Issue of ReplyToCom occurs with more than 5K links. and have to do the GWT method. And being afraid, i have to re-install the Disqus Comment System.
Any one else had this issue ?
I changed the titles and slugs of many of my wordpress articles and then pressed the update button. Now in my webmaster tools all of these articles are being called duplicate content meta description and duplicate content title tags.
My traffic dropped by 40% and I think it is this
How do I fix? Thanks please help!
Hello,
In my robots.txt file ‘”Sitemap: http://webfunspace.com/sitemap.xml.gz” this is also shown. is that ok??
Thanks for sharing this useful tool. My blog is still too small for people to copy and steal. I’ll continue to write more posts on my blog and hopefully one day, someone out there will steal. That means I’ve produce good content.
This is really good piece of information. Thanks for taking time to post this.
Perfect post. I have just did what you have asked in WordPress SEO by yoast.
I think it will not index comments of my blog? Right?
Thanks
Be very careful what you do in Google Webmaster tools. I did exactly what the author said to do and got a message from Google on the same telling my top url lost 78% of its traffic. This post has a ton of comments as well as other top posts. My traffic is now cut in 1/2 of what it was. I am back pedalling right now and reversed what I did.
My reading of Google webmaster is it doesn’t care about duplicate content as long as it on your site. It knows what to do.
Focus more on people who scrap your content or ones that you gave permission to use your feeds.
Oh and to add. WordPress excludes indexing of threaded comments (replytocom). It has code in it that says nofollow and noindex. If you use Screaming Frog, you will see this in “directives.” This is how I found out that you really don’t need Yoast Plugin for this or change your parameters. WP took care of it.
However, that being said, Yoast did find that due to that code, you lost link juice from those that submit comment. So, he upgraded his plug-in to strip out the WP code.
I know this all sounds counterintuitive, but I guess his plug-in takes care of the duplicates but in a good way.
Also, in 2009, Google said don’t fool with your robots.txt as to this matter. See http://googlewebmastercentral.blogspot.com/2009/10/reunifying-duplicate-content-on-your.html. They said advised other ways to stop the duplication. Please read.
Honestly, everything Felipe says after using Yoast plug-in, I would ignore. It is really dangerous unless you truly understand what you are doing.
After using Screaming Frog tool, I was amazed that I got over 400 pages’ duplicates.
Thanks for your great information.
Nice info, these are some of the geniune qualities contents peoples spend time researching and content creator are lacking amoung their characters 9 ultimate Guides to Maintain and maximise the power status of your mobile/pc battery last longer