Robots.txt for WordPress blog
There is a nice plugin that help you do “noindex, nofollow” to certain pages to remove lots of duplicate content. These plug ins don’t cover pages that are not in WordPress. Here is an example WordPress robots.txt.
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /date/
Disallow: /comments/
I went from 100 pages in the supplemental to 9. This is not for all blogs. Some blogs have different URL’s. Check your supplemental index by typing
site:domain.com –view ***
This will show the pages you have in the supplemental index.

I tried this on my site and found 7 pages in the supplemental index. Mostly swf (flash) files. I’m not sure if there is anything I can do about those pages.
Also I copied and pasted the (Site:domain.com –view ***) from this post and it did not work.. so I tried various combinations and eventually realized it was the “S” being capital. I changed it to lowercase and it worked.
Thanks for the useful search modifier.
Interesting I did not know the site: command was case sensitive. I type my posts in word and I guess word capitalized it for me. Thanks for the heads up.
Hi,
I hope you enjoyed SES New York. I have tried this command a few times since the last time you posted this and it seems to no longer work. Not sure if something changed.
Try this
I finally got my site indexed near perfectly in google with this wordpress robots.txt example
It seems nowadays major search engines especially Google, can filter blog “supplemental content” very easily too. Although it never hurts settings up a robots.txt of course.