Quintin Par asked:

Is there an official API to iplists.com from where I can get the list of spiders?

My intention is to whitelist these IPs for site scraping.

There’s no list of IP addresses for “good” search engine bots that I know of, and if there were it would be horribly out of date pretty quickly, as you’ve already discovered.

One thing you can do is to create a bot trap. This is simple in theory: You create a page that is linked to in your web site but hidden from normal users (e.g. via CSS tricks) and then Disallow it in robots.txt. You then wait a week since legitimate search engines may cache robots.txt for that long, then start banning anything that hits the trap page (e.g. with fail2ban).

