Popular Blogging Platforms may Suffer Search Engine Penalties
Do blogs suffer from duplicate content penalties in major search engines? A few days ago, this thought struck me as I was beginning to rework the look and feel of this blog - with posts showing up in as many as four different locations on the blog, was there any reason to think Google, Yahoo!, MSN, and other search engines may actually be penalizing my blog? The thoughts were further brought to the front when Barry Schwartz made a post on Search Engine Roundtable pointing to a thread on
Let’s consider the Wordpress blogging platform for now, although this should hold true through any other blogging platforms, and indeed, a variety of Content Management Systems. Looking at one of my previous posts, a list of SEO resources, you will find it at a variety of locations:
- It’s Permalink page: http://www.infohatter.com/blog/creating-a-list-of-seo-resources/
- Second page of the “Front Page’: http://www.infohatter.com/blog/page/2/
- Category for Advertising: http://www.infohatter.com/blog/category/advertising/
- Category for Link Building: http://www.infohatter.com/blog/category/link-building/
- Category for SEO: http://www.infohatter.com/blog/category/seo/
- More categories…
- Archives for August 2006: http://www.infohatter.com/blog/2006/08/
So as you can see, this text is replicated fully in as many as 10 different locations on my blog. What we are looking at here is a conflict between user-friendliness, and search-friendliness. This is an ideal setup by accessibility standards - the more ways you provide to access a piece of information, the more user friendly it is. But does this affect indexing and ranking in the search engines for this post?
When I perform a , I see that the single post page is the first result shown. So this is good - it means that when limited only to my site, Google is ranking the single-post page at the top, which is what I want to see. But are the rankings affected when the somebody searches the entire index of Google? Do the other copies of this post on this blog perhaps cause it to show lower than it would if it was only available in one place?
I think it does. In the past, Google has typically penalized dupe content hard, often sending sites to the supplemental index for such an offense. So, are all bloggers getting the same type of penalization?
What can be done about this? One of the first solutions to jump out at me would be to include on the category and archive pages; this would ensure that only the front page and the single post pages are indexed.
I would appreciate any thought or comments on this matter - this is something that should concern all bloggers. Perhaps this may be having a very noticeable impact on readership levels? Either way, it is something that could bear some serious thought.
Share This
I’ve been researching this issue a bit myself. At first I thought nothing of it, since technically the duplicate content was from the same domain. This may be the case, but I wasn’t happy with that risk. Instead I thought that since duplicate content can be considered for a www vs. non-www address, there was likely the chance that it did so for duplicate content in the case you have mentioned.
I am also using wordpress, and have come up with a solution I think should work. Basically, I made a small plugin that adds meta name=”robots” content=”noindex,nofollow” which is a way of telling search engine crawlers not to index this page. My plugin adds this to any archive, search or tag (I use ultimate tag warrior plugin).
Now, this may not be the ideal way to do it, as essentially all of your archive pages are no longer indexed in google. I’m still torn as to whether or not this is a good thing. It is a good thing if it prevents duplicate content, but it’s not a good thing in the sense that it drastically cuts down the number of content pages you have.
I’m interested in more input about this as well. I’ve tried asking the question on some SEO oriented forums, but was met with answers that didn’t really address the question. This is indeed an interesting and possibly serious topic.
You can get in touch with me about this if you want, nobodyfamous at hollywoodsnark dot com
How exactly does google determine if you have dupe content?
Well, in the process of indexing your site, Google stores a copy of the textual content of your page on their servers. It is this copy that they run search queries against. I would imagine that in the process of indexing all of the pages that they would easily be able to check portions of your page against their index to see if there are copies already existing.
Since most of my visitors doesn’t come from Google (only less than 20% — which is still alot though), I don’t care too much about duplicate content. Unless, that duplication is done by third party without my consent…