[ Date Index ][
Thread Index ]
[ <= Previous by date / thread ] [ Next by date / thread => ]
On Wednesday 02 Jul 2003 9:19 pm, Jonathan Melhuish wrote: > > Sorry, that was probably a bit unclear in my original email. Although it > is indeed a "query string", it isn't passed in the normal "?variable=value" > way, it's passed as a supposedly 'normal-looking' URL, eg. > > http://www.smssat.biz/scan/fi=products/sp=results_big_thumb/st=db/co=yes/sf >=category/se=OtherReceivers/va=banner_image=/va=banner_text=.html?id=f8YyQGt >r > > I'm not sure why they decided to do it like that. I dunno, I didn't design > it guv ;-) But is there anything technically wrong with an URL like that? Technically wrong? I'd say it was stretching the rules because: 1. It uses a non-existent filesystem: It's pretending (if you read the URL strictly) that there are 8 sub-directories below the .biz domain whereas none probably exist with the names specified (with or without the = ). 2. It uses non-standard repetition: It's imitating a query string and then adding a real one (the xx=xx would appear to some form of variable=value statement) - repetition that is likely to cause many a parser to barph. 3. Required filedata is absent: There's no 'real' file anywhere for processes like Google to grab onto - I'd presume there's some index.php default.asp or similar behind it but it's not stated and therefore must be assumed, which is often a bad tactic. (Assume = an ass out of u and an ass out of me). Stretching the letter of the 'rules' but breaking the spirit? Personally, I wouldn't like to use an engine that relied on this type of persistence. I'm not surprised that it doesn't parse well with processes like Google. Incidentally, the W3C validator site can parse the URL but the engine itself responds with some very bad HTML - it uses a HTML4 Transitional Doctype (which would usually mean that someone cares about producing valid code as a DocType isn't any use to a browser, only a validator engine like at W3C) but uses tag attributes removed from HTML4 (marginheight), omits required attributes (img alt=""), fails to properly nest tags, omits to properly escape entities (& should be replace with &) and puts settings in HTML that should be in CSS (img border=0). The validator URL is far too long to post here (as it includes the whole URL you quoted plus an extra query string for W3C settings). Incidentally, the validator turns the whole URL into the hexadecimal characters I mentioned last time. Here's the first bit: http://validator.w3.org/check?uri=http%3A%2F%2Fwww.smssat.biz%2Fscan%2Ffi%3Dproducts%2Fsp %2F / %3D = Here are 5 of the 40 errors reported by the validator: Line 18, column 19: there is no attribute "MARGINHEIGHT" (explain...). Line 51, column 61: there is no attribute "BORDER" (explain...). Line 22, column 154: required attribute "ALT" not specified (explain...). Line 73, column 143: cannot generate system identifier for general entity "mv_pc" ...k/sms.ic/ord/basket.html?id=DabMJzjp&mv_pc=14" class="menubarlink">Your baske Line 795, column 5: end tag for "TABLE" omitted, but its declaration does not permit this (explain...). It would take some time to bring that page to the intended HTML4 Transitional standard proclaimed at the top of the page returned from that URL. 1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2: <html lang="en"> 3: <head> 4: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> It would take some time longer to get the engine itself to consistently produce valid HTML4 Transitional code - and a decent understanding of the engine itself too. Perhaps this URL format is what we have to put up with if cookies get such a bad press. Essentially, the URL appears to be trying to track the current transaction(s) and results - exactly what a cookie should do. If a cookie was properly designed and used, the entire construct could be replaced and you'd have a normal directory and filename after the .biz/ which Google would be only too happy to parse. Other engines like this use a server-side database to store all this info and a normal query string with the ID= setting to retrieve the rest of the data from the server database. (See the DCLUG Wiki as an example of database driven persistence). That requires an extra step in installation and an extra layer to debug - not always appealing but not actually that hard to implement because so many components fit neatly within the appropriate public standards. Is there a different engine available for the job? -- Neil Williams ============= http://www.codehelp.co.uk http://www.dclug.org.uk http://www.wewantbroadband.co.uk/
Attachment:
pgp00010.pgp
Description: signature