The Semantic Web, circa 1934

2 minute read

The Times has a great story today by Alex Wright on Paul Otlet’s early efforts to create a network of the information akin to today’s Web. In spite of bloviating along the lines of “The hyperlink is one of the most underappreciated inventions of the last century” (Kelvin Kelly, quoted for the article, apparently both asleep during the technology boom and never having read his own magazine, Wired), Wright’s piece treats Otlet’s work surprisingly fairly and is sensitive to the promise and limits of his analog approach. On the delivery side, Otlet imagined amalgamating the cutting-edge media technology of the day: telephone, radio, television. The glue for all this data would be the laborious human-directed cataloging and organization of information.

Of course there is a much longer history to the attempt to forge universal networks of information. To a historian of France, Diderot and D’Alembert’s Encyclopédie springs to mind. Spanning 28 volumes of text and plates, published over the course of two decades, and including nearly 80,000 entries, the Encyclopédie introduced readers to the cross-reference (the most underappreciated invention of the eighteenth century?) and also explicitly and implicitly connected them to the relevant texts of the day, either through cited references or outright plagiarism.

The success of the Encyclopédie stemmed as much from the print technology it exploited as from the extraordinary individuals who participated in the project. Over 140 individuals contributed articles. Some were experts in their fields, while others were generalists attempting to synthesize a wide range of knowledge. A single contributor, the chevalier de Jaucourt, produced over 17,000 articles, averaging over eight per day. Yet even in the eighteenth century, this massive endeavor could not keep pace with knowledge production. Wikipedia of course today brings a far larger population of contributors to bear, but it effectively frames the problem no differently, simply applying twentieth-century technology to an eighteenth-century problem.

With Diderot and D’Alembert’s Encyclopédie and Otlet’s Mundaneum, we get the sense of historical actors confronting a coming tsunami in human knowledge. Both the eighteenth century’s explosion in printing and literacy and the early twentieth century’s new media challenged existing taxonomies of knowledge. What’s missing from today’s efforts, hinted at by the Times piece, is the human element. The old Stanford-era Yahoo was limited but extremely useful because human beings created and populated the taxonomy by hand. Google is today almighty, but it’s essentially a dumb interface, and as the corpus of digital media continues to mushroom we’re as likely to be rickrolled or googlewhacked as find the information we seek. It remains to be seen to what extent machine learning and data mining can identify and weave together semantic meaning in digital media.



Leave a comment