[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
Does anyone else here host fairly large (10MB+) PDF files on a web server? Know anyone who does? Do you ever record Googlebots going a bit "mad" for them? I see requests for the same file every 6 minutes, which are logged in Apache as 200. It will happen for a few times for the same file, over the weekend one Googlebot retrieved one 13MB file 13 times at six minute intervals. Doesn't happen often, but it seems very wasteful when it does. Also the file was retrieved (200), when there is now a reverse proxy to stop things chewing out bandwidth, so it should have been served by the reverse proxy. In the resource it spent trying to download that one PDF, the Googlebot could probably have reindexed vast chunks of the Internet. Any ideas on a good place to ask? I may just get the reverse proxy to convert such requests into refresh requests. Not exactly HTTP compliant, but I don't suppose anyone will ever notice. -- The Mailing List for the Devon & Cornwall LUG http://mailman.dclug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html