I encountered something strange in the access log of our Apache server which I cannot explain. Requests for webpages that I or my colleagues do from the office’s Windows network get repeated by another IP (that we don’t know) a couple of seconds later.
The user agent repeating our requests is
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET
CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR
3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)
Has anyone an idea?
I’ve got some more information now.
- The referrer of the replicate is set to the URL I requested before and it’s not the exact same request as the protocol version is changed from ‘HTTP/1.1’ to ‘HTTP/1.0’.
- The IP is not just one, it’s just one of a subnet (80.40.134.*).
- It’s just the first request to a resource that’s get repeated, so it seems the “spy” is building up some kind of cache of visited places.
- The repeater is also picky. I tried randomly URLs with different HTTP status codes and different file patterns. 301s and 200s are redone, 404s not. Image extensions seem to be ignored.
While doing my tests I discovered that this behavior seems to be common as I found other clients visiting just after the first requests:
22.214.171.124 – – [25/Oct/2012:10:51:33 +0100] “GET /foobar/ HTTP/1.1” 200 10952 “-” “Mediapartners-Google”
126.96.36.199 – – [25/Oct/2012:10:51:33 +0100] “GET /foobar/ HTTP/1.1” 200 41312 “-” “Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)”
I wasn’t aware about this practice, so I don’t see it that much as a threat anymore. I still want to find out who this is, so any further help is appreciated. I’ll try later if this also happens if I query some other server where I have access to the access logs and will update here then.
After some digging, I was able to determine that accesses from 80.40.134.* originated from TalkTalk Virus Alerts. This ISP is monitoring its users’ web traffic and scanning the pages its users visit for viruses/malware.
Mediapartners-Google is just Google AdSense. You placed Google ads on your page, so Google is reading the page text in order to provide ads targeted to the content.
The final example you gave is self-documenting; try visiting the URL given.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.