The Semantic web: is it, or isn’t it?

The Semantic web: is it, or isn’t it?

Cropped version of Noam chomsky.jpg.

Noam Chomsky (Wikipedia)

Noam Chomsky’s entire structure of Transformational Grammar came crashing down when I read Lectures in Government and Binding. To be quite honest, I doubt many people have actually read it. Compared to the breezy text of his original Transformational Grammar papers, Lectures in Government and Binding was as dense as mud, and no more appetising.

What shattered the illusion for me was that, once I had worked out what was actually being presented, it became clear that the endless increase in complexity which had robbed Transformational Grammar of its prescriptive power was being side-stepped by proposing a putative ‘move-alpha’, which would sort out all the problems, and, as importantly, semantics only came into a sentence very late in its proposed construction in the mind. In other words, Chomsky was suggesting that we start talking, and only decide that we want to say something, and what it is, after the sentence has begun.
Of course, we all do have experience doing that. But any kind of rational discourse would become impossible if that really were all there was to it.
If you’re wondering where this is going, let me explain. The origins of Transformational Grammar, although they are now part of mainstream linguistics, were in the development of machine language and computer interpreting of human discourse. Chomsky never really got anywhere with turning his transformational grammar into anything that was useful to a computer. In fact, a computer is much more adept than a human at retaining every single inflection of every single word in a language, and so its need for transformational rather than traditional grammar is intrinsically less.
However, despite all that, computer semantics remains a holy grail for the IT community.
If you have ever tried getting something translated by Google, Babelfish or others, and you actually know how to read the language it was translated into, you will already be aware of how miserably useless machine translation is. If you don’t speak any foreign languages, and are an ardent technology fan, you are probably already getting ready to write a comment saying that computer translation is the future. Please feel free to write the comment, but please also accept that to speakers of more than one language, the current results in online translation are simply ludicrous.
Web 3.0, which is an idea which has been around almost ten years, is about a world wide web which is tagged and interpreted along semantic grounds. In other words, the Semantic Google of the future will be able to search out for you not merely pages that contain words you are interested in, but will refine them based on the meaning you actually want to find.
For example, if you type ‘Octavius spider Roman’ into Google, it will currently find first a pen by MontBlanc, then an article about Spider Man, then a forum post by someone who styles himself Gaius Octavius, and only then some Wikipedia disambiguation. It is quite a long way down the page that we find an article about Octavian the Roman Emperor, even though if you merely typed in Octavius, it would take you straight to the relevant article. The reason is that Google doesn’t understand that, if you put Octavius and Roman together, you are almost certainly looking for the first Roman emperor, even if there is a character called Octavius in Spiderman. As a semantic being, you, a knowledgeable human would have put that together in an instant, and you might then have wondered what my interest in spiders was in the context of the first Roman emperor. Google, though, which works only on probabilities based on web-linking, has no such insights.
A semantic based web would, we hope, get you there immediately.
So much for the theory. I’m writing this article with the help of a plugin called Zemanta, which is aiming to be one of the harbingers of Web 3.0 by searching out links for me connected to what I’m writing about, rather than just the words I’m using. It’s good, in parts. Aside from the fact that it doesn’t seem to be able to process more text than the first paragraph, it works with more or less the same amount of effectiveness as Google. True, it did find for me a variety of articles on the semantic web, and on Chomsky, but Google would have done the same thing, though it might have found different articles. The team behind Zemanta are more selective than Google anyway, and there seems to be some genuine desire to exclude web-aggregators and other anti-semantic net-entities. Still, that isn’t quite enough to make a summer.
Web 3.0 is an ambitious, science-based, intellectually rigorous attempt to make the information superhighway a bit more like a highway and a bit less like a city dump. In a world where social networking has become the predominant form of electronic communication, and even the ubiquitous blogging is being consigned to a dusty fifth place, the aim is laudable. Instead of more chit-chat, more froth about memes going viral on Twitter, global conversations which involve millions but contain almost no content, the notion of a self-organising web is attractive, if not compelling.
It is uncompelling because the fundamental building block of a semantic web is a semantic computer, and such a thing has yet to be invented. In the same way that we can isolate certain visual cues and use them to generate robot vision, we can isolate certain linguistic clues. With semantic tagging, rather like the old Strong’s Numbers for would-be scholars of New Testament Greek who were not willing to learn Greek, we can network some of the semantics, joining the dots between words. But even Strong’s numbers involved a fairly heavy measure of pre-interpretation, based around a text which was itself not changing. Tagging words so that a computer can associate them will not create a truly semantic result.
If your main interest is web trends, then get this one: Web 3.0 is coming, and it will revolutionise the internet. But if your main interest is linguistics, then don’t hold your breath. A truly semantic web — until someone can come up with something a lot swisher than Chomsky’s Transformational Grammar — is no nearer today than it was when Chomsky first began to dream.
  • Glenn J. Bingham

    With great respect for the Father of Modern Linguistics, I cannot quibble with your thoughts that Government and Binding Theory is abstruse at best, and a step backwards by Chomsky in understanding natural language processing at worst. Having programmed computers in the Dark Ages (late ’60s, early 70′s) on natural language processing projects, I needed to explain to the computer that every time a character string had a blank, a new word was to follow. I may have had a stronger feel for the complexity of the beast than some do now since they are equipped with ready-made word processors. I agree that we really don’t know any better how to digest information now than we did pre-G&B.

    Furthermore, there is a prevailing notion that syntax IS semantics. Worded in a softer tone, the thought that semantics is somehow “read off of” the syntactic structure leads to the notion that there is no message in mind that needs to be expressed, a basic tenet of anyone’s communication theory. Despite formal indoctrination in the field, I never bought that idea either. Language is the pairing of a message to an appropriate string of sounds…and vice versa. The message is logically prior to the syntactic expression of it, which is logically prior to the phonetic expression. Any exposition that does not focus on that arrangement is not to be trusted.

    I am pleased to see some renewed interest in some of the earlier works of Charles Fillmore, et al. After 40 years of wheeling in circles trying to find and then move alpha, I think we may be getting back to the nuts and bolts of the matter. There is hope. But as you say, an effective Web 3.0 may not spring forward any time soon. We may need to recapture a couple of lost decades of research.

    Glenn J. Bingham

Back to Top