Member Login

JCal Pro Mini-calendar

October 2020 November 2020 December 2020
Mo Tu We Th Fr Sa Su
Week 44 1
Week 45 2 3 4 5 6 7 8
Week 46 9 10 11 12 13 14 15
Week 47 16 17 18 19 20 21 22
Week 48 23 24 25 26 27 28 29
Week 49 30

Current Time

The time is 21:04:47.
Presentation of the dissertation project and the research results to date (Beuck) Print E-mail

Presentation of the dissertation project and the research results to date:

Motivation

The starting point of my project is the observation that human language processing is inherently incremental, i.e. processing starts on partial input and an interpretation of the current input is available at each point in time. Early approaches to natural language processing (NLP) were based on a sequence of discrete modules, each working independently on the complete output of the previous one. The underlying assumption was that the problem tackled in each domain can be solved independently without feedback from later stages in the processing. Contrary to this assumption, human language processing has been shown to be inherently incremental [1] and highly interleaved [2]. Incremental processing starts before the input is complete, generates interpretations for partial input, and additional input is integrated into the current interpretation if possible, reanalyzing otherwise. Interleaving of processing includes syntactic and semantic expectations driving speech recognition, and contextual knowledge driving linguistic processing. This context is especially rich in a face to face communication situation, where there are multiple communication channels in different modalities (e.g. gestures) and a common visual context to refer to.

To approach the cognitive example given by human language processing we have to make sure an artificial language processing system is able to

  • process partial input as early as possible

  • provide partial analyses as early as possible

  • generate expectations to be fed back to other modules like speech processing

  • integrate expectations derived from the context (e.g. the dialog history, visual context or other communication channels like gestures)

Platform

The platform of my research is the Weighted Constraint Dependency Grammar (WCDG) System [3]. The WCDG-System is already capable of incremental syntax processing, i.e. to assign a dependency structure to a prefix of a sentence and to use this partial analysis as starting point for analyzing an extended prefix. This is faster than starting from scratch for about 95% of the test sentences [4].

Goal

The goal of my project is to design and implement an architecture for passing partial analyses and expectations between different modules in language processing, including an interface to integrate processing of other modalities, especially vision. Such an interface will be bidirectional, with input from NLP guiding (visual) attention and input from the external modality giving hints on disambiguation in NLP. This part of the work is based in the project of Patrick McCrae, who worked on syntactic disambiguation via context integration. Furthermore, I am cooperating with Christopher Baumgärtner, who is working on online context integration and attention guiding.

To determine the requirements of such a system I will investigate the applications of partial analyses, especially given a visual context. I have identified the following possible applications:

  • early reference resolution

  • anticipate upcoming input

  • feedback to sender on processing status

The possible benefits of this approach include

  • improved speed and accuracy of the syntax parsing

  • improved performance of other modules by top-down expectations derived from the analysis of incoming language and multi-modal context integration.

  • improved user acceptance in human to computer/robot dialogue situations by the use of verbal or nonverbal feedback during utterance time.

  • reproduce psycholinguistic observations related to incremental language processing, like anticipatory eye movement or garden pathing.

Early Reference Resolution

A partial linguistic analysis can be used to establish references to entities from the context. There are several benefits of doing this as early as possible:

  • Determine if a nominal phrase (NP) already denotes a unique object from the context or if additional complements like an upcoming prepositional phrase (PP) are needed. By this, ambiguities like the PP-attachment problem could be handled where a context is available (compare [5]). If an ambiguity cannot be solved in this way, its early detection enables a timely feedback in the form of a clarifying question.

  • Visual focus can be shifted to referenced objects while the utterance is produced, thus giving feedback to the producer.

  • Visual focus can be shifted to gather additional details from the visual context. This information can be used to check the current interpretation for consistency with the context and to help process the rest of the utterance.

  • Extract partial instructions and possibly begin execution, which saves time and makes it possible to give feedback.

Building up Expectations

When only parts of an input sentence are known, there are several ways to anticipate different aspects of the upcoming input. These expectations can be inferred linguistically from the partial input alone or contextually, by integrating other sources. One source of linguistic anticipation is the linguistic constraints violated by parsing a partial input with a constraint dependency grammar parser like WCDG. These might hint at the sentence being incomplete and at what kind of continuation is missing.

Upcoming words anticipated via linguistic constraints will be mostly underspecified. For example, a violated transitivity constraint of a verb proposes an upcoming object, but provides no further details about it. To enrich the anticipation, the underspecified slots can be augmented in several ways. One approach is to add ontological aspects to the valence information of a verb, e.g. the direct object of “to eat” is probably edible. Context information is a rich source of anticipation as well. Partially specified Events - an action and its arguments - can be extracted from partial input and be matched with events visible in or possible to execute in the visual context, thereby identifying candidates to fill unknown parts in the event.

Anticipation of the upcoming input has several applications in natural language processing, including

  • as top-down expectation for speech recognition

  • for incremental syntax parsing, matching upcoming words to placeholders from prediction that are already integrated into the analysis structure.

  • simulate anticipatory eye movement to search for possible upcoming referents, as has been observed with humans.

Feedback

In face to face dialog between humans, feedback is given by the listener while an utterance is produced. This feedback includes verbal and nonverbal clues on

  • whether the utterance was successfully processed up to that moment

  • which entities in the visual context are related to the utterance, by eye movement

  • ambiguities that could not be resolved, possibly by asking back.

This kind of feedback could be provided by a robotic system, given that its language

processor works incrementally, with early reference resolution and a capacity to anticipate certain aspects of the upcoming input.

Current Status of the Project

The initial conception and literature review stage is finished and the project is currently in the implementation phase for a first prototype.

The next steps include

• designing and implementing means of integrating feedback from later stages of processing into WCDG

• combining the context integration architecture designed by Patrick McCrae with the incremental processing mode of WCDG

• selecting specific psycholinguistic observations to reproduce

Connections of my PhD project to other CINACS projects

Multi-modal context integration into language processing is the research topic of two other Cinacs PhD students, Patrick McCrae and Christopher Baumgärtner, with whom I cooperate. In his project “A Model for the influence of Cross-Modal Context upon Syntactic Parsing”, P. McCrae designed a system for disambiguating syntactic ambiguities via semantic knowledge gained from visual context. In my project I am, expanding on this foundation in cooperation with C. Baumgärtner to permit online context integration during incremental language processing.

With the project of Dominik Off, Cross-Modal Enhanced Memory for Mobile Service Robots, my project shares the common scenario of giving instructions to a service robot in natural language. This scenario is interesting for my research, as it is rich in references to objects in the surroundings and possible actions therein. A long-term goal is to integrate the planning and language processing components on a robot platform like TASER.

The Tsinghua Cinacs students Wei Qiao and Kaixu Zhang work on the problem of Chinese word segmentation. I discussed with them the possibilities of integrating word segmentation with syntax parsing and the similarities to word segmentation in speech recognition. The WCDG parser used in my project is already capable of working with word lattices, the ambiguity structure found in word segmentation, and is thus promising to become a common platform for research in this direction. I hope to be able to intensify this cooperation in the future.

References

[1] R. P. G. van Gompel and M. J. Pickering. Syntactic parsing. Oxford Handbook of Psycholinguistics, 2007.

[2] M. Mayberry, M. W. Crocker and P. Knoeferle. Learning to Attend: A Connectionist Model of the Coordinated Interplay of Utterance, Visual Context, and World Knowledge. Cognitive Science, 33: 449-496, 2008.

[3] K. A. Foth. Hybrid Methods of Natural Language Analysis. PhD-Thesis, Universität Hamburg, Department Informatik, 2007.

[4] N. Beuck. Inkrementelles Parsing mit Constraint-Dependency-Grammatiken. Diploma Thesis, Hamburg University, 2009.

[5] B. Timothy and S. Matthias. Incremental natural language processing for HRI. Proc. HRI 2007 - the ACM/IEEE International Conference on Human-Robot Interaction, Washington DC, USA, Mar 09-11, 2007