In Praise of the Hyperlink

Web 2.0. Love the term or hate it, you’ve certainly heard it. Even if you’re a hardened cynic and you pride yourself on not drinking Tim O’Reilly’s koolaid, it’s hard to deny that something is going on: something new, something that is just the start of a brave new world 2.0.

The theme of this year’s Reboot is Renaissance. It doesn’t take much of a stretch to compare that term with the ubiquitous “Web 2.0”.

The common perception of 15th century Northern Italy is to view it as the birthplace of a whole new movement in art and culture: a Culture 2.0, if you will. We tend to think of the Renaissance as an almost revolutionary movement, sweeping aside the old-school 1.0 dark ages.

But the Renaissance didn’t come out of nowhere. The word itself means rebirth, not birth. The movers and shakers of the Renaissance — the analogerati of Florence — weren’t trying to make a break with the past. They were trying to get back to their roots. At its heart, the Renaissance was a very conservative movement with an emphasis on reviving and preserving classical ideas. By classical, I mean Greek and Roman. There is a direct line of descent from the Acropolis in Athens to the beautiful buildings built in Copenhagen during the Danish Renaissance. The building blocks of the Renaissance were centuries-old ideas about mathematics, aesthetics, and science.

There is a lesson for us there. With all this talk of a Web 2.0, there’s a danger that we as web developers, whilst looking to the future, are forgetting our past. In our haste to forge a new kind of World Wide Web, we run the risk of destroying the fundamental building blocks that helped create the Web that we fell in love with in the first place.

I don’t intend to run through all the building blocks that form the foundation of the Web. Each one deserves its own praise. HTTP, for example, the protocol that enables the flow of pages on the Web, is worthy of its own love letter.

I’d like to focus on one very small, very simple, very beautiful building block: the hyperlink.

The hyperlink is an amazing solution to an old problem. That problem is classification.

Language is the most powerful tool ever used by man. Together with its offspring writing, language enables us to document things, ideas, and experiences. I can translate a physical object into a piece of information that can be later retrieved, not only by myself, but by anyone. But there are economies of scale with this kind of information storage and retrieval. The physical world is a very, very big place filled with a multitude of things bright and beautiful; creatures great and small. If we could use the gift of language to store and retrieve information on everything in the physical world, right down to the microscopic level, the result would be transcendental.

The first person to seriously tackle the task of cataloguing the world was born after the Renaissance. Bishop John Wilkins lived in England in the 17th century. He was no stranger to attempting the seemingly impossible. He proposed interplanetary travel three centuries before the invention of powered flight. He is best remembered for his 1668 work, An Essay towards a Real Character and a Philosophical Language.

The gist of Wilkins’s essay is explained by Jorge Luis Borges in El idioma analítico de John Wilkins (The Analytic Language of John Wilkins).

He divided the universe in forty categories or classes, these being further subdivided into differences, which was then subdivided into species. He assigned to each class a monosyllable of two letters; to each difference, a consonant; to each species, a vowel. For example: de, which means an element; deb, the first of the elements, fire; deba, a part of the element fire, a flame.

You can find more delvings into Borges’s essay on a weblog by Matt Webb; the fittingly named interconnected.org.

The problem with Wilkins’s approach will be obvious to anyone who has ever designed a relational database. Wilkins was attempting to create a one to one relationship between words and things. Apart from the sheer size of the task he was attempting, Wilkins’s rigidity meant that his task was doomed to fail.

Still, Wilkins’s endeavour was a noble one at heart. One of his contemporaries recognised the value and scope of what Wilkins was attempting.

Gottfried Wilhelm von Leibniz possessed one of the finest minds of his, or any other, generation. It’s a shame that his talent has been overshadowed by the spat between Newton and himself caused by their simultaneous independent invention of calculus.

Leibniz wanted to create an encyclopaedia of human knowledge that was free from the restrictions of strict hierarchies or categories. He recognised that concepts and notions could be approached from different viewpoints. His approach was more network-like with its many to many relationships.

Where Wilkins associated concepts with sounds, Leibniz attempted to associate concepts with symbols. But he didn’t stop there. Instead of just creating a static catalogue of symbols, Leibniz wanted to perform calculations on these symbols. Because the symbols correlate to real-world concepts, this would make anything calculable. Leibniz believed that through a sort of algebra of logic, a theoretical machine could compute and answer any question. He called this machine the Calculus ratiocinator. The idea is a forerunner of Turing’s universal machine.

Let me tell you about another theoretical device. It’s called the memex (short for “memory extender”). This device was proposed by Vannevar Bush in 1945 in an article in the The Atlantic Monthly called “As We May Think”. Bush described the memex as being electronically linked to a library of microfilm. The device, contained within a desk, would be capable of following cross-references between books and films. This almost sounds like hypertext.

But there may be a form of proto-hypertext that precedes the memex.

In recent years the works of James Joyce have been revisited and re-examined through the prism of hypertext. Ulysses and Finnegan’s Wake make sense when viewed, not linearly, but as a network of interconnected ideas. Marshall McLuhan was heavily inspired by Joyce’s communication technology. The medium was very much the message.

For most of us, Finnegan’s Wake remains an impenetrable book, at least in the narrative sense. It might make more sense to us if we suffered from a medical condition called apophenia: the perception of connections and meaningfulness in unrelated things.

This isn’t necessarily an affliction. In his book Pattern Recognition, William Gibson describes an apopheniac cool-hunter hired by marketers to detect the presence of Gladwellian tipping points in a product’s future.

Apophenia is a boon for conspiracy theorists. If you’re fond of a good conspiracy theory, I recommend staying away from the linear and predictable Da Vinci Code. For a real hot-tub of conspiracy theory pleasure, nothing beats Foucault’s Pendulum by Umberto Eco.

For a conspiracy theorist, there can be no better tool than a piece of technology that allows you to arbitrarily connect information. That tool now exists. It’s called the World Wide Web and it was invented by Sir Tim Berners-Lee.

There was no magical “Eureka!” moment in the invention of the Web. It was practical problem solving, not divine revelation that resulted in the building blocks of Universal Resource Identifiers (URIs), the Hypertext Transfer Protocol (HTTP), and the Hypertext Markup Language (HTML). Berners-Lee’s proposal built on top of the work already done by Ted Nelson and Douglas Engelbart, the inventors of the first true hypertext system in 1965.

If there was anything revolutionary about the World Wide Web, it was the fact that it was not patented, instead being declared free for all to use. That spirit of scientific sharing clearly didn’t rub off on British Telecom who attempted to enforce a patent which they claimed gave them intellectual property rights over the concept of hyperlinks. The claim was, fortunately, laughed out of court.

The World Wide Web is the ultimate MVC framework. URIs are the models controlled by HTTP and viewed through HTML. While the view may seem like the least significant component, it is the simplicity of HTML that is responsible for the explosive growth of the Web.

There was nothing new about markup languages. Standard Generalised Markup Language had been around for years. Before that, red pens allowed editors to literally mark up text to indicate meaning.

Like SGML, HTML used tags — delineated with angle brackets — to nest parts of a document in descriptive containers called elements. The P element can be used to describe a paragraph, the H1 element describes a level one heading, and so on.

The shortest element is the most powerful. A stands for anchor. Nestled within the anchor element is the href attribute. This attribute, short for hypertext reference, is the conduit through which the dreams of Leibniz, Joyce, and a thousand conspiracy theorists can finally be realised.

Anybody could create anchors containing hypertext references. Just about everybody did. The Web grew exponentially in size and popularity. With every new web page and every hyperlink, the expanding Web became a more valuable and powerful aggregate resource.

This power was harnessed by Sergey Brin and Lawrence Page. The concept behind their PageRank algorithm is simple: links are a vote of confidence. If a lot of links point to the same page, that page is highly regarded. By combining this idea with traditional page analysis, they created the best search engine on the Web; Google.

In order to measure the PageRank of everything on the Web, the googlebot spider was unleashed. In some ways, the googlebot is like any other user agent: it visits web pages and follows links. It’s also possible to see the googlebot as a kind of quantum device.

When you or I visit a web page that has, say, ten links, there are two theories about what happens. According to superposition, the next page we visit exists only as a probability. Not until we make a decision and click on a link does the page resolve into one of the ten possibilities.

The alternative view is the many worlds interpretation. According to this theory, visiting a page with ten links would cause the universe itself to branch into ten different universes. You or I will remain in the universe that matches the link we clicked. But the googlebot is different: it follows all ten links at once, spidering alternate worlds.

I have first hand experience of Google’s stockpile of parallel universes. To celebrate Talk Like a Pirate Day, I created a simple server-side script. You can pass in the URL of a web page and the script will display the contents interspersed with choice pirate phrases such as “arr!”, “shiver me timbers!”, and “blow me down!”. The script also rewrites any hrefs in the page so that the pages they point to are also run through the pirate-speak transmogrifier.

It was amusing. It even appeared on Metafilter. The problems started later on. I began to get irate emails, even phone calls, from website owners demanding that I remove their files from my server. I was even threatened with the Digital Millennium Copyright Act. I was fielding angry emails from people all over the world in charge of completely disparate websites.

The googlebot had landed on my Talk Like a Pirate page (perhaps it followed a link from Metafilter). Then it began to spider. It never stopped. Somewhere within the Googleplex there is a complete one to one scale model of the World Wide Web that’s written entirely in pirate-speak.

Now when site owners do a search for their websites to check their ranking, the pirate facsimile often appears before the original. I can’t help it if my Googlejuice is better than theirs.

I began to feel remorse when I heard from the proprietor of a spinal surgery clinic in Florida who told me that potential customers were being scared away by messages detailing “professional treatment, me hearties!”

I have since added a robots.txt file but it can be a long time between googledances. Parallel universes don’t just disappear overnight. I guess the googlebot isn’t a quantum device after all because, it seems, it can’t be everywhere at once. That’s where it falls down. How is it supposed to deal with websites that are updated frequently?

Trying to define what a blog is can be a slippery task. Most definitions include the words “online journal”. I’ve been told that my online journal isn’t a blog because I don’t have comments enabled. I must have missed the memo.

What really makes a blog a blog isn’t the addition of comments or the fact that it’s an online journal. The defining characteristic of a blog is the presence of permalinks. Permalink, a portmanteau word from permanent and link, should be a tautology. All links should be permanent.

Permalinks, and by extension, blogs, encourage linking. Instead of simply saying “here’s my opinion”, blogs allow us to say “here’s a permanent linkable address for my opinion.”

The earlist blogs were link logs, places you could visit to find links that somebody thinks are worth visiting. Even now, I find that the best blog posts are often ones that point out the connections between seemingly separate links. Bloggers are natural apopheniacs; conspiracy theorists who can back up their claims not just with references to their sources but with hypertext references… hrefs.

Even though all blogs have permalinks, there’s something inherently transient in the nature of blogging. It’s a tired cliché but the aggregate web of blogs really is like a conversation. The googlebot can’t hope to follow all the links spawned by all these voices speaking at once. Technorati does okay though.

Technorati is also the breeding ground for some infectious little ideas called microformats. These microformats embrace and extend the Hypertext Markup Language. Making use of the little-known rel attribute, the anchor element can be made even more powerful. In XFN, XHTML Friends Network, the addition of rel equals friend, colleague, met, and other descriptors adds extra semantic weight to a link (as yet there is no Enemies Network, but Brian Suda and I are working on a draft specification).

The bonus semantics offered by microformats can be harvested and collated to form a clearer picture of the connections that were previously less defined.

Microformats are the nanotechnology for building a semantic web.

That’s lowercase semantic web. The uppercase Semantic Web still lies in our future. Another theoretical future technology is XHTML 2, wherein any element whatsoever can have a hypertext reference.

Perhaps we aren’t worthy of such a bounty of hrefs. Right now hrefs exist only in the anchor element and yet still we manage to abuse them.

I like JavaScript. It is, as Douglas Crockford put it, the world’s most misunderstood programming language. The problem lies not with the JavaScript language but its integration into hypertext documents.

The javascript pseudo-protocol is an abomination. It is not a valid hypertext reference.

Using a pointless empty internal page reference is almost as bad. If you can’t provide a valid resource for the href value, don’t use the anchor element. Anchors are for links. Don’t treat them as empty husks upon which you hang some cool Ajaxian behaviour. Respect the link.

If we value and cherish the links of today, who knows what the future may bring?

Maybe Bruce Sterling is right. Maybe we’ll have an internet of things. Spimes, blogjects, thinglinks… whatever the individual resources are called, they’ll have to be linkable: hyperlinked addressable objects existing in our regular, non hyper-, space.

It sounds like an exciting future. We live in an equally exciting present.

We have all come together here in Copenhagen because of how much we love the World Wide Web. I bet every one of you has a story to tell about the first time you “got” the Web. Remember that thrill? Remember the realisation that you were interacting with something that was potentially neverending; a borderless labyrinth of information, all interconnected through the beautiful simplicity of the hyperlink. We may have grown accustomed to this miracle but that doesn’t make it any less wondrous.

We are storytellers, no longer huddled around separate campfires, we now sit around a virtual hearth where we are warmed by the interweaving tales told by our brothers and sisters. Everyone is connected to everyone else by just six degrees of separation. Thanks to the hyperlink, we can find those connections and make them tangible.

The dream of hypertext has become a reality.