We don’t need no stinkin’ URI schemes

Ok, so I, for a while like Jeremy have lamented the disappearance of the URI scheme (the "http://” bit ) in URLs when seen in advertising and the like. Maybe it’s because my day job is basically a network (and security) consultant. Making communication happen through network protocols is my bread and butter. A full URL spec like http://www.abc.net.au/news is unambiguous in intent. It describes to a suitable application which host to connect to, on what port, and what protocol it should talk with. It also indicates the particular datum of interest.

But there are two aspects to review in Jeremy’s argument. Firstly, how do humans recognise that abc.net.au/triplej is a contraction of a URL? Secondly, is there really a technical need to specify the protocol?
As far as “knowing” that a string of text is a shorthand URL, we can look for the following telltales :-
1. There might still be almost deprecated URI scheme (http://) as internet jargon that they learned circa 1994 (for the average Joe Blow at least).
2. We know that www means World Wide Web.
3. Words punctuated only by “.” and “/” is normally an internet thing.
4. The letters in URLs are most always in lowercase.
5. They are familiar with the common TLDs – “.com“, “.org“, “.au” and so on – another dead giveaway.

So what happens when URIs become non-obvious because some of the this distinguishing marks are missing. (I am not a linguist or semanticist so I may well have this wrong)

Jeremy has already seen the demise of point 1. The same goes for point 2. When I was a lad, domains always had separate host records, so company.com.au would always have a host www.company.com.au to provide it’s web prescence. Mail to joe@company.com.au almost invariable was steered by the MX record for the domain to mail.company.com.au, the mail host. Not so today. Maybe it was the fact that doubleyoodoubleyoodoubleyoo is hard to say (which is why some trendy geeks say stuff like dubdubdub or wahwahwah. Point 3 is interesting, again back in the 60s you always wrote abbreviations with fullstops/periods in between the capitalised letters. Like A.B.C. or C.S.I.R.O or the Man from U.N.C.L.E You just don’t do that now. So now basically those “.”s have been repurposed as domain name delimiters – and I reckon that this is actually the strongest clue we have now. With point 4, domain names are can just as easily be uppercase (DNS is case insensitive) but the file part of the URL often is not. Because UNIX systems were ruling the roost when web servers first were deployed, and we tended to write all file names in lowercase, this idiom seemed to stick. Finally for point 5, it ain’t so easy nowadays. Jeremy’s domain name is under the “.name” TLD, but what about .museum – does anyone even know that http://australian.museum is a valid domain name?

But anyway I guess us humans cope, and if the publicity gurus do misjudge when they prepare their ad copy, then they don’t get hits on their website. So I guess the URLs that aren’t real obvious get removed from the internet gene pool through natural selection.

On the second aspect, around the idea that there are protocols on the ‘net other than HTTP, does it really matter? Firstly HTTP is almost always the starting point in any case. If you do need jump from HTTP to something more private like HTTPS then the browser will do that for you. If you need to stream multimedia then the .m3u file you hit will redirect you to something more appropriate. And semantically the combination of your client application and server might be able to determine what you intended anyway. For instance if you type arnoldschicken.com.au into your phone, I reckon it should just give you the option to dial their nearest store. Or type arnoldschicken.com.au into your GPS navigator then it should by default set the nearest store as your destination. The semantic bit could either be derived from the user-agent, or possibly the device could add context either through a URL (say arnoldschicken.com.au/locations might return a list of parseable locations for the GPS or arnoldschicken.com.au/phonenumbers could return a list of numbers (that could be connected via SIP). Alternatively standard SOAP calls might be invoked to give similar information. Certainly more work can and should be done in this space. So I guess defaulting to HTTP may well make sense for when people initiate the connection – if needed the application then switches to the more appropriate protocol or scheme when it needs to.

So in conclusion, while “http://” might be dead, humans are pretty smart in recognising truncated URLs, and machines will get better (if not already) help us make better use of these. (And just for a final point – how many of just bang a few letters into the Google search bar and get what we want to find pretty quickly in any case!)