LJ Archive

At the Forge

URLs

Reuven M. Lerner

Issue #242, June 2014

How URLs, a technology that we take for granted, are changing with the times.

The world Wide Web recently celebrated its 25th anniversary. As I have written in previous columns, the growth and ubiquity of the modern Web never cease to amaze me. I get my news, television and podcasts via the Web, not to mention my groceries and airline tickets, and it allows me to and communicate with my consulting clients around the world.

From my perspective, part of the genius of the Web, designed by Tim Berners-Lee, was its simplicity. Numerous researchers had been discussing hypertext for years before Berners-Lee appeared on the scene—and when he did, it was with a set of technologies that remain with us: HTTP, HTML and URLs.

I wouldn't claim that these technologies are unchanged after 25 years of usage, but it is pretty amazing to see how much they resemble their original versions. HTML has become, of course, far more sophisticated, thanks in no small part to the HTML5 set of standards, along with JavaScript and CSS. HTTP has undergone a number of changes to improve its efficiency through the years, and it would seem that HTTP 2.0 eventually will be released, bringing with it considerably improved performance and security.

The lowly URL, however, has remained largely unchanged—at least, until now. In this article, I want to spend a bit of time talking about URLs (and their cousins, URIs and URNs) and the changes that are happening in the world of Web technologies. In particular, we're seeing changes in Web application technologies that have far-reaching implications for how we use the Web and for the ways in which our URLs function—especially as the Web becomes increasingly full of mobile and single-page apps.

Uniform Resource X

The idea behind URLs (Uniform Resource Locators) is a simple one: it identifies a document on the Internet. If you're reading this article, you presumably know that a URL can look like this: http://example.com/foo.

The first URLs were defined in fairly simple terms. There was a protocol, followed by a colon, and then (in the case of HTTP, at least) a server name, port number and pathname.

But soon after URLs first were unveiled, people started to consider that other things deserved unique identifiers. For example, let's say you want to refer to a book. Each book has a unique ISBN, so shouldn't it be possible to refer uniquely to a book via its ISBN? The IETF, which is in charge of many Internet standards, certainly thought so, and thus created the idea of a URN, or Uniform Resource Name. Whereas a URL points to a resource on the Internet, via a protocol, server name and pathname, URNs point to off-line resources via a unique code. Thus, you can point to a book with urn:isbn:0451450523.

Where does this book reside on the network? That's not the sort of question you're supposed to ask about a URN. URNs uniquely identify resources, but they don't tell you where to find them on-line. A URN always begins with “urn:”, followed by the type of resource you're describing. Following that, you'll then have a unique identifier for that resource. A URN should be unique; many books may share the same title, but each book has a unique ISBN.

With the creation of URNs, both URNs and URLs then became specific types of URIs, or Uniform Resource Identifiers. A URI can be a URL, identifying a particular on-line location. Or a URI can be a URN, pointing to a unique resource in the world.

Parts of a URL

Although URNs certainly are a great idea, I haven't ever used them in my work. But I have used URLs extensively, and I expect that all other Web developers have done so too.

URLs are remarkably flexible, in that they can specify any protocol—and then for each protocol, a URL can specify a particular access method. The following URL, http://lerner.co.il/, thus indicates that the resource is available via the HTTP protocol. HTTP URLs then have a hostname, followed by an optional port number that defaults to 80 for HTTP and 443 for HTTPS (that is, HTTP with SSL encryption). So the previous URL also could be written as http://lerner.co.il:80/, but there's generally no need to do so. Following the slash that comes after the hostname, there is a path. So, I can say: http://lerner.co.il/team.

This is where things start to get a bit interesting. The “/team” is passed to the Web server at “lerner.co.il” and describes...well, we don't know what it describes. To the outside world, the “/team” path seems to indicate part of a hierarchy, and probably even a document. Inside the Web application, it can be anything at all. In modern Web applications, the “router” looks at the URL and decides which object and/or method should be activated based on the path.

Now, in most cases, this is all you're going to need. But there are some additional, often ignored parts of URLs that are becoming increasingly important. For example, the hash character (#) can exist in the URL, and it separates the main URL from the “fragment”. What is a fragment? Whatever you want it to be—that part of an HTTP URL is handled internally by the application and/or the browser.

For years, the fragment was used to let you skip to a particular part of a page. So if you went to http://example.com/foo.html#section2, and if there was a “name” link inside the page with the value “section2”, the browser would move you there.

Another use for the fragment was to provide a URL for links that didn't exist for actual linking, but rather so that JavaScript could fire. That is, you could create a link like:


<a id="click-me" href="#">Click me</a>

If you were to click on such a link from a browser without JavaScript, nothing would happen. But in a browser with JavaScript, the page presumably would set a callback, such that clicking on the link would fire up some JavaScript code.

To date, the fragment probably has been the smallest and most easily ignored part of a URL. But that is changing, and rapidly, thanks to the rise of single-page applications. However, before I discuss those, let me first talk about REST and what it means for URLs.

REST

“REST” has nothing to do with sleep; it is an acronym for Representational State Transfer and was coined by Apache cofounder Roy Fielding in his PhD dissertation. The idea behind REST is that you often see URLs as ways to access applications and documents on the Web, including the things you want to do with those applications and documents. So you might have a /register URL on your site, as well as a /view_status or /see_book?id=100.

The REST says that you should stop creating such URLs, and that you should instead see a URL as a unique way to describe an on-line resource. Thus, user 100 on your system becomes /user/100. Wait, you want to do something with user 100? That requires a verb, rather than a noun. Instead of using the URL, or part of it, for the verb, you instead should use the verb that already is being used with the URL—namely, one of the appropriate HTTP verbs. Most of us are only familiar with the HTTP methods GET and POST, but there are a bunch of others too. (Not that they're really supported by most browsers, of course.)

Now, I must admit that when REST became a mainstream, and even preferred, way to create URLs with Ruby on Rails, I tended to resist it. But over time, I have learned to appreciate the elegant simplicity of these URLs, particularly in an age when a growing number of HTTP verbs are supported by browsers, or (as in the case of Rails) you can automatically provide a parameter that indicates the request method, overriding the POST that you always send.

Rails has been particularly successful at pushing REST as a paradigm, in that controllers are assumed to provide seven different methods automatically, which are mapped in a standard way to combinations of HTTP request methods and URL patterns. Now, just because Rails does REST a certain way doesn't mean that everyone needs to do it in precisely that way, using the specific URL style and meaning that Rails has defined. But that style, or something very close to it, has become quite popular, as you can see from such packages as Grape API for Ruby or Django REST Framework for Python.

One of the interesting aspects of using URLs in a REST framework is that the URL now describes an object, which often is mapped not only to a router and/or controller, but also to an object in a database. Thus, the URL /users/1 effectively will allow me to retrieve, via the Web, information about user with ID 1.

Although such information used to be passed in XML or even in HTML, it's now fairly standard to transmit API data using JSON, which is standard, easy to work with and implemented in all modern languages. A RESTful API that uses JSON is increasingly common as the browser portion of applications becomes more important and needs to load and save data using these APIs.

Single-Page Apps

The most recent version of a Web application is the single-page application. From a user's perspective, you can call it a “single-page application”, because it doesn't ever need to refresh the whole page, even when you click on a link or a button. Rather, JavaScript changes the page on the fly, modifying the DOM elements and reacting to events within the browser window.

It's possible to create single-page applications using a library such as jQuery, but as things get complex, it becomes somewhat difficult and frustrating to do so. You end up spending time developing solutions that handle the infrastructure of such an application, rather than the application itself. If this sounds familiar, that's because the same thing happened about a decade ago. People were tired of writing the same code again and again for their Web applications. As a result, the notion of a “framework” was born, with Rails and Django being two of the most prominent players in that space.

Backbone.js was one of the first client-side frameworks, but it wasn't the only one. Indeed, there are dozens of frameworks, each claiming to be some degree of MVC (model-view-controller) that run in the browser and allow programmers to create rich, client-side applications in relatively short order. More recently, Backbone and its ilk have given way to a new and more thoroughly designed type of framework, with the two leading contenders being Ember.js and Angular.js. (I intend to write about both of these quite a bit in the coming year.)

For me, at least, the most striking thing when I started to learn Ember and Angular was their talk about the “router”. Now, in Rails, a router is the part of the code that maps the URL /users/101 and knows to invoke the appropriate code. And indeed, the router in Ember does something very similar, taking the URL and ensuring that the correct code is invoked.

But wait a second—I'm talking about a single-page app, right? If you're working with Ember, what is your router doing worrying about what URL is being passed? The answer, it turns out, is that the router in both Ember and Angular aren't looking at the main part of the URL, but rather the fragment. The URL will not be /users/101 but rather myapp.html#/users/101 or something of the sort. This means that you now effectively have two URLs you need to think about: one that tells the server which application you want and then a second that tells the client-side application which JavaScript code to run. This new use of URLs still looks somewhat strange to me, as it's making use of the fragment, which I had largely ignored for years. However, it's also exciting to see that URLs continue to be flexible, adapting to new uses for the Web, and making it possible to continue using browsers in new and interesting ways.

Reuven M. Lerner, a longtime Web developer, consultant and trainer, is completing his PhD in learning sciences at Northwestern University. You can learn about his on-line programming courses, subscribe to his newsletter or contact him at lerner.co.il.

LJ Archive