Member Login

JCal Pro Mini-calendar

February 2021 March 2021 April 2021
Mo Tu We Th Fr Sa Su
Week 9 1 2 3 4 5 6 7
Week 10 8 9 10 11 12 13 14
Week 11 15 16 17 18 19 20 21
Week 12 22 23 24 25 26 27 28
Week 13 29 30 31

Current Time

The time is 12:19:16.
Presentation of the dissertation project and the research results to date (Baumgärtner) Print E-mail

Presentation of the dissertation project and the research results to date:

My dissertation is about online context processing for parsing of natural language. In this project I will research the influence of visual context information on a system for natural language processing.


The foundation for my research is the nature of context integration during language processing in human beings. When humans process language, they are able to use information from other modalities to enhance their capability in language understanding. This shows when humans understand utterances of their communication partners by watching their mimics and gestures. It also applies to the understanding of written language, which is better understood when graphics are used that contain information related to the text. My research focuses on two characteristics of the interaction between the modalities of vision and language.

The first characteristic of cross-modal interaction between language and vision in humans is the continuous availability of information in one modality for use during the processing of information in the other modality. New visual information is immediately used to better understand a sentence and language information is directly helpful when perceiving a scene.

The second characteristic is the search for new visual information to enhance the understanding of natural language. A person listening to spoken language can guide his or her visual attention towards objects and events in the environment in order to receive additional information that is helpful for language processing.


The goal of my project is to implement the above-mentioned characteristics of human language and vision interaction in an artificial system. For this I will use a language parser developed at the University of Hamburg. This parser is called Weighted Constraint Dependency Grammar (WCDG). The parser is built to process sentences in any language although it is currently used only to parse sentences in German. The input the parser receives is a grammar of German, a lexicon of German and a German sentence. The output of the parser is a parse tree with syntactic and semantic information. The parsing process is based on weighted constraints which disallow or at least penalize certain interpretations of a sentence.

One advantage of this system is the integration of arbitrary external predictors that provide information which can be used to augment language processing. One of these predictors is the Plausibility Predictor Component (PPC) developed by Patrick McCrae. This predictor uses visual information provided by a context model and computes scores that can be used by the language parser in order to make the analysis of a given sentence dependent on the description of a visual scene. Although this approach already integrates the two modalities of vision and language, the system does not exhibit the characteristics of continuous availability (as the scores are computed only once at the beginning of the parsing process) and search for new information mentioned above.

On-line Integration

The first goal of my dissertation is to model the constant integration of visual knowledge into the parsing process. I am planning to do this by integrating the functionality of the PPC directly into the parser. By this, WCDG can directly access visual scene knowledge modelled in the provided context model whenever it is needed during the parsing process. This will provide the system with the same robustness to changes to its knowledge as it is observed in human beings. When this first phase of my project is finished, the system will be able to react to dynamically changing context knowledge by immediately adapting the parsing process to the added information.

Knowledge Exploitation

The second human characteristic to be implemented into the system will be the ability to gather new information from the visual context. This will be done by identifying those parts of the sentence that can be related to objects in the scene. These objects will be explored in the visual scene by a context processing unit for new information. When information that is not yet part of the visual knowledge can be found, this is added to the context model. The advantage of the information gathering process will be an improved capability of the system to parse a given sentence, as the new knowledge is used to re-evaluate the same sentence resulting in a different and hopefully improved analysis. As I try to model the same kind of information gathering as in humans, the knowledge exploitation processes of humans and the system will be comparable by checking whether the same objects are investigated by the system and a human observer who hears the given sentence.

Links to other Projects:

Niels Beuck; Anticipatory Incremental Language Processing in Multi-modal Context:

When the cross-modal language-parser is integrating visual knowledge online (see above) it will be possible to explore the system’s ability to react to a changing visual environment during every part of the parsing process. In order to test the system’s performance in this regard (especially compared to human language processing while the visual context is changing) the ability to parse a sentence incrementally is essential. By this it will be possible to investigate the effects on parsing at every moment.

Patrick McCrae; A Model for the Influence of Cross-Modal Context upon Syntactic Parsing:

This thesis is the basis for my own research. Mr. McCrae’s System already has the ability to integrate visual context into the processing of the natural language parser. As I plan to enhance the system’s capability to use visual information, I am strongly cooperating with Mr. McCrae during my research.

Dominik Off; Cross-Modal Enhanced Memory for Mobile Service Robots:

Part of Mr. Off’s project will be to use the memory system of a robot as a basis for his implementation of an action planner. As this memory system mainly consists of information about the robot’s visual environment, an interface for the interaction between the robot and a human using natural language should be possible with the results obtained from my research.

Results and work to be done

Since I have joined CINACS I have begun my research by integrating the functionality of the PPC into the WCDG system. It is now possible to use updated visual information whenever necessary during parsing. The knowledge is processed by computing scores for constraints of the parser. These scores change whenever relevant visual information changes in the context-model. I have begun my work on the implementation of the knowledge exploitation part of my project. Several algorithms for finding relevant visual information that apply to the processed sentence are already available. I currently work on further enhancing the system’s ability to find the information needed for parsing, especially with regard to robustness. By doing this, it is my intention to build a system that is capable of matching information from language and vision even if the information is modelled differently in both modalities.