[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
On Monday 06 October 2008 10:29, Simon Waters wrote: > Does anyone else here host fairly large (10MB+) PDF files on a web > server? Know anyone who does? > > Do you ever record Googlebots going a bit "mad" for them? > > I see requests for the same file every 6 minutes, which are logged in > Apache as 200. It will happen for a few times for the same file, over > the weekend one Googlebot retrieved one 13MB file 13 times at six minute > intervals. > > Doesn't happen often, but it seems very wasteful when it does. Also the > file was retrieved (200), when there is now a reverse proxy to stop > things chewing out bandwidth, so it should have been served by the > reverse proxy. > > In the resource it spent trying to download that one PDF, the Googlebot > could probably have reindexed vast chunks of the Internet. > > Any ideas on a good place to ask? > > I may just get the reverse proxy to convert such requests into refresh > requests. Not exactly HTTP compliant, but I don't suppose anyone will > ever notice. try a robots.txt to stop them searching the things Also consider your data format - is it necessary to have your data encrypted in PDF's? Tom te tom te tom -- The Mailing List for the Devon & Cornwall LUG http://mailman.dclug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html