[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
Tom Potts wrote: > > try a robots.txt to stop them searching the things robots.txt is rather difficult in the circumstances I could use mod_rewrite, but for that I'd want to know when/how Googlebots start messing up (I suspect it is a timeout thing so depends on free bandwidth, since it is managing 3 to 4MB before stopping). But since Googlebot often managed to index 8MB files it would cost functionality. Hence I hoped to find if others are seeing same, so I can go to Google and say "these googlebots are burning your bandwidth unnecessarily....", or if I can find other who definitely don't see it, I might figure out what is different. > Also consider your data format - is it necessary to have your data encrypted > in PDF's? Not my data. I need to make the service robust against anything folks put in a PDF file. I don't really care of it is 13MB of random numbers, it should still be served using HTTP correctly and efficiently. Of course Googlebots may be very disappointed when they get the 13MB to find it is password protected - but that doesn't account for what is going wrong on the layers below. -- The Mailing List for the Devon & Cornwall LUG http://mailman.dclug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html