[ Date Index ][
Thread Index ]
[ <= Previous by date / thread ] [ Next by date / thread => ]
On Fri, Feb 21, 2003 at 10:18:17 +0000, Neil Williams wrote: > How do I open a remote HTTP file using Perl? > The ordinary open(HANDLE,"$file"); doesn't work ( -r $file ) reports unable to > read. > I want to load a page from a separate server (actually execute a search and > read the results using the query string using a URL from a bookmark) and read > the file into the script for analysis. I can't open static pages at the > moment, but I would expect that opening a dynamic page wouldn't differ once > the process is done. > Anyone with ideas? use python; :) Urllib in python is very easy. Retrieves the file to /tmp, so you can open and play. Here is a little script that demonstrates this, and I go on to search for all full urls with regular expression, and it finds: http://www.w3.org/TR/REC-html40/loose.dtd http://devoncornwall.pm.org/ http://www.southwestlug.uklinux.net/ http://www.lug.org.uk Been doing this stuff very recently when I wrote a plugin for a blog to retrieve images in image urls and generate a thumbnail for them. See: http://db.cs.helsinki.fi/~hendry/log/ #!/usr/bin/env python2 import urllib, re url = 'http://www.dclug.org.uk/' fileurlpattern = r'(?:http|https|file|ftp)\:+[\/\-\_\.\w]+[\/\w][\?\&\+\=\%\w\/\-\_\.]*' f=open(urllib.urlretrieve(url)[0]) s = f.read() # read contents of file into string for i in re.finditer(fileurlpattern, s): print i.group() -- The Mailing List for the Devon & Cornwall LUG Mail majordomo@xxxxxxxxxxxx with "unsubscribe list" in the message body to unsubscribe.