Googlebot repeatedly looks for files that aren't on my server

John asked:

I’m hosting a site for a volunteer organization. I’ve moved the site to WordPress, but it wasn’t always that way. I suspect at one point it was hacked badly.

My Apache error log file has grown to 122 kB in just the past 18 hours. The large majority of the errors logged are of this form — it’s repeated hundreds of times today alone in my log files:

[Mon Nov 12 18:29:27 2012] [error] [client] File does not exist: /home/*******/public_html/*******.org/calendar.php
[Mon Nov 12 18:29:27 2012] [error] [client] File does not exist: /home/*******/public_html/*******.org/404.shtml

(I verified that was a Google server.)

I suspect there was a security hole somewhere before, likely in calendar.php, that was exploited.

The files don’t exist anymore, but there may be many backlinks that exist that reference here, hence why googlebot is so interested in crawling them.

How do I fix this gracefully? I still would like Google to index the site. I just want to tell it somehow not to look for these files anymore.

My answer:

This is one thing that the 410 Gone error can be used for.

Google and other search engines can use this information to determine that a URL is no longer valid and is expected to never be valid again, and thus remove it from their indexes.

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.