What does it take (for a robot) to be a ‘linguistic agent’?

  • A linguistic agent can be visualized as a robot navigating, interacting with, and ultimately contributing to the building (“terraforming”) of a Universal Knowledge Graph.
  • The primary criteria of success is not simply guessing the correct answer, but demonstrating a full understanding of what is being said – by generating the full logical form for any sentence or phrase that is being processed in a dialog. The Logical Form must state BOTH the denotation of a sentence, AND also the set of relevant presuppositions.
  • Conversational agents always have to deal with other such agents. A multi-agent environment is the natural habitat of a linguistic agent. In a multi-agent setting, each agent exchanges messages with other agents. This goes to the very essence of natural language.
  • Importantly, it must have access to a scientific linguistic parser, able to handle ‘ellipsis’ and other kinds of gaps and recover the “understood” material.
  • As is well known, quantifiers play a very central role in the semantics of natural language. In order to support quantifiers, it is necessary to handle bound variables.
  • Finally, no conversational agent can get away with failing to understand elementary logic – it is a basic component of the normal use of language. To qualify as a linguistic agent, a learning agent must be conversant in Second Order Predicate Calculus (and some elementary set theory) – even BEFORE starting to learn its own field of specialization. This requires machine learning to be in some way combined with symbolic reasoning.

Syntactic Graph

What is Syntactic Graph? This is the concept that replaces in Minimalist Framework “Parsing Tree” of earlier research. While the term “Syntactic Graph” was not used, the concept was introduced by Chomsky as part of radical simplification of syntactic theory.
When you build a syntactic object by recursively “merging” and “re-merging”, what you get is, mathematically speaking, a Graph.

Start with a list of Vertices (“Numeration”). Then gradually add directed Edges, removing from Numeration any Vertix pointed to by an Edge. Every time a new Edge is added, it must be from a Source in Numeration to a Goal which is either in Numeration OR accessible from Source. Entire algorithm for building Syntactic Objects in the Minimalist Framework is presented in this paragraph in terms of Graph Theory.

Adding an Edge pointing to a Vertix still in the Numeration is called “Merge”, to a Vertix already not in Numeration – is called “Re-Merge”. Existing Literature about Syntactic Graph is using the terminology of theoretical linguistics heavily. http://ow.ly/2TUId

Syntactic Graph feeds both (so-called) “Interfaces”: (a) phonological Spell-Out, (b) “Conceptual-Intentional” system(formerly Logical Form).

In Minimalist Framework, each word is usually represented (in Syntactic Graph) by a Vertix or two, with somewhat complex internal structure. Nanosyntax is further development of Minimalist Framework, at finer-grain resolution, with Syntactic Graph handling word-internal structure.

Syntactic Graph is the natural language for describing the meaning of natural language.

Representing text in computers by alphabetic characters is archaic technology – it already was ancient in Antiquity… The transition to using Syntactic Graph simply makes sense – scientifically valid representation of text and its meaning.

Increased Interest in the Semantic Web

The Semantic Web is receiving more and more attention recently which can be seen from the fact that Google bought Metaweb, the company that developed the social semantic database. It is less known that Microsoft has recently licensed the advanced linguistic technology from Cognition. As a result, Microsoft now has access to the technology of both Powerset and Cognition.

When Microsoft bought Powerset they did not integrate it into their main search product Bing, so one might have concluded that Microsoft was not really that interested in utilizing Natural Language technology, but simply wanted to make use of the brain power of Powerset’s team of computer scientists. But now that Microsoft has also licensed the scientific linguistic parsing from Cognition, it seems clear that they do in fact have plans to utilize Natural Language technology.

At this point, out of the three companies which concentrate on the computerization of the most recent theoretical linguistics, two (Powerset and Cognition) have been grabbed up by Microsoft. This leaves Linguistic Agents’ technology as the only available alternative on the market.

Linguistic Technology

Humans have a unique ability (known in linguistics as Language Faculty)  to process complex syntactic structures. Language Faculty is a subject of study by theoretical linguists. 

After many years of intensive research in the field of theoretical linguistics, there has been significant progress in the deciphering of the general properties of human language.  The recent expansion of research beyond English and other familiar European languages has enabled the refinement and verification of the central discoveries of theoretical linguistics. 

People tried the idea of precprocessing with a lingutic parser that generates “parse tree”, but dealing with plain text directly leads to much better results.  As the result, the progress achieved by theoretical linguistics in the study of the human language is not presently utilized in deep learning. 

The Scientific Infrastructure for the Linguistic Web

There are two major pre-requisites for the emergence of the Linguistic Web:

1) A solid Linguistic Parser

2) An extensive Lexical Semantic Database

Not only do these two elements need to exist – they must also be generally available to developers worldwide. We will now examine what is the current status of each one of these two crucial components of the Linguistic Web.

Linguistic Parser

Just about everybody has heard about the big buzz generated by Powerset’s acquisition by Microsoft. This is just an example of the magnitude of effort needed for the development of realistic, industrial strength linguistic software.  

Scientific linguistic technology necessitates a very long period of development and significant financial investment.  Powerset’s collection of natural language technologies incorporates over 25 years of intense scientific research, which originated at the PARC (Palo Alto Research Center). 

After all this invested effort in research and development, it is not clear what the policy of Microsoft will be in regards to making their sophisticated linguistic platform generally available.   

And yet, there are other players on the block with advanced linguistic parsers, who just may go ahead and make available the scientific technology which can be used in the Linguistic Web for the massive production of language oriented applications.

Lexical Semantic Database

To meet the requirements of the Linguistic Web, any Semantic Ontology must be constructed in terms of natural semantic concepts used by Language Faculty (FL), the basis for the inborn human ability to process language.

What is needed is something such as the “Semantic Map” developed by Cognition Technologies.  It took more than 20 years to build, and is probably the largest scientific linguistic database for English in the industry.

Is there a comparable Lexical Semantic Database available for everyone? Not at the moment. Perhaps what is needed is a Wikipedia-type collective effort in order to build a global Lexical Semantic Database. Obviously, this effort must be in sync with the accumulated insights of the last 60 years of intense research in theoretical linguistics. 


Any way you look at it, an infrastructure which contains both a Linguistic Parser and a Lexical Semantic Database will be needed in order to jumpstart the Linguistic Web.

Imagine the economical impact of all these various natural language solutions, in all the major languages, being developed worldwide, all using the same underlying linguistic platform. Of course, a new standard format for the representation of Natural Language Objects will also be necessary, but this is a subject for another posting.

21 Semantic Roles for Linguistic Web

Following is the preliminary list of Semantic Roles (known in linguistics as “thematic roles”) for use  in Linguistic Web.  The Roles are part of  the intermediate protocol making it possible for linguistic processing services to present the results in simple and unified form.

(1)  AGENT


(3)  CAUSE

(4)  THEME





(9)   GOAL

(10)  SOURCE


(12)  PATH

(13)  MANNER




(17)  RESULT


(19)  TIME






Linguistic Technology Going 2.0.

Linguistic Web envisions high resolution linguistic intelligence tools freely accessible to the developers of natural language applications, opening possibility for  easy collaboration between providers of linguistic analysis software and developers of natural language applications.

Companies like Powerset and Cognition Technologies already have solid, scientifically-based software for linguistic analysis, a fruit of long years of research.  Of course, application developers do not presently have free access to anything like these proprietary solutions.  Nevertheless, the very existence such advanced systems proves that it could be done.