The Scientific Infrastructure for the Linguistic Web

There are two major pre-requisites for the emergence of the Linguistic Web:

1) A solid Linguistic Parser

2) An extensive Lexical Semantic Database

Not only do these two elements need to exist – they must also be generally available to developers worldwide. We will now examine what is the current status of each one of these two crucial components of the Linguistic Web.

Linguistic Parser

Just about everybody has heard about the big buzz generated by Powerset’s acquisition by Microsoft. This is just an example of the magnitude of effort needed for the development of realistic, industrial strength linguistic software.  

Scientific linguistic technology necessitates a very long period of development and significant financial investment.  Powerset’s collection of natural language technologies incorporates over 25 years of intense scientific research, which originated at the PARC (Palo Alto Research Center). 

After all this invested effort in research and development, it is not clear what the policy of Microsoft will be in regards to making their sophisticated linguistic platform generally available.   

And yet, there are other players on the block with advanced linguistic parsers, who just may go ahead and make available the scientific technology which can be used in the Linguistic Web for the massive production of language oriented applications.

Lexical Semantic Database

To meet the requirements of the Linguistic Web, any Semantic Ontology must be constructed in terms of natural semantic concepts used by Language Faculty (FL), the basis for the inborn human ability to process language.

What is needed is something such as the “Semantic Map” developed by Cognition Technologies.  It took more than 20 years to build, and is probably the largest scientific linguistic database for English in the industry.

Is there a comparable Lexical Semantic Database available for everyone? Not at the moment. Perhaps what is needed is a Wikipedia-type collective effort in order to build a global Lexical Semantic Database. Obviously, this effort must be in sync with the accumulated insights of the last 60 years of intense research in theoretical linguistics. 


Any way you look at it, an infrastructure which contains both a Linguistic Parser and a Lexical Semantic Database will be needed in order to jumpstart the Linguistic Web.

Imagine the economical impact of all these various natural language solutions, in all the major languages, being developed worldwide, all using the same underlying linguistic platform. Of course, a new standard format for the representation of Natural Language Objects will also be necessary, but this is a subject for another posting.