Archive

Posts Tagged ‘Googlebot’

Never let a Search Engine make a decision

February 12th, 2009 5 comments

When you build a website you need to completely understand how a search engine looks at you. Never let a search engine make any decisions. When a search engine comes to your website you need to know exactly how it is going to crawl your site.

Pages should only have one url. If you have a database driven website it is very common that the same content can be pulled up with an unlimited number URL’s. You have to enforce your url’s. Your content needs to be tied to only one url. Put logic into your site so that if it detects that content is being pulled up by the wrong url that you 301 redirect to the correct url. All URL’s should be consistent. The best way to do this is to make every url lower case. If you use a windows server you need to be extra careful because IIS will send you to the same content and produce duplicate content.

Google can now run Javascript

August 25th, 2008 33 comments

It appears that the first step in making flash content spiderable Google has added the ability for googlebot to execute javascript. I’m not sure how robust it is but I do know that it can. I’m working with a client that had a very large amount of content that was previously not in Google because the content required javascript to be seen. Out of nowhere this site has over 600,000 pages of new content in Google that was previously not there.

I was very confused at first. I knew that those pages were not there before and that Google should not be able to see them. I put one of the url’s in a spider simulator and the text was being shown. I put it firefox and IE with javascript turned off and content would not show up. I put in in lynx and the content did not show up either. I wonder why and how the top spider simulators are running javascript.