Google bot is crawling my site right now and it’s killing my server. Its only crawiling one or two pages a second, but those pages are really CPU intensive. I have already added those CPU intensive files to the robots.txt file, but googlebot hasn’t detected those changes yet. I want to block google bot at the apache.cong level so my site can be back right now. How can I do this? This one apoache instance is hosting a few PHP sites and a django powered site, so I can’t use .htaccess files. The server is running Ubuntu 10.04.
Assuming you don’t actually want your site delisted from Google (which the accepted answer will eventually cause) set a crawl delay value for your site in Google Webmaster Tools. It is reported that Google does not support
robots.txt, though you may wish to set that value for other search engines and crawlers to use.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.