Presentation of the dissertation project and the research results to
date:
Introduction
Is
it possible for an autonomous agent to infer sensorimotor laws based
on interaction with its environment and develop cognitive behaviour?
We investigate this question by combining several approaches from the
field of developmental robotics, computational neuroscience,
neurophysiology and psychology.
In
contrast to many traditional beliefs that rest on the idea that the
brain stores an internal representation of the world, O'Regan and Noe
propose a theory where the outside world itself serves as an external
memory [4]. The perceptual experience is a result of the learned
mastery of what they call sensorimotor contingencies (SMCs),
combining physical properties of the environment as well as those
stemming from a particular sensor system. This approach naturally
accounts for the differences in the perceived quality of sensory
experience across modalities and stresses the necessity of the
interplay between sensor perception and motor actions which already
has been suggested by von Helmholtz [1].
Creatures
are not born endowed with a matured set of SMCs. They need to develop
and are shaped through the given sensor and actuator properties in a
lifelong learning process. Similarly, the field of developmental
robotics deals with the progressive and incremental development of
proficiencies in machines. Instead of carrying out a particular
predefined task, the robot can discover its perceptual, cognitive,
and behavioural capabilities through actions based on its own
physical morphology and the dynamic structure of its environment. To
achieve this goal, exploratory activity as well as some kind of
motivation is crucial.
Materials
& Methods
Our
system architecture is (so far partially) implemented on the
Robotino® platform (Festo Didactic, Germany,
http://www.festo-didactic.com/). This robot is equipped with nine
infrared distance sensors, a colour webcam as well as two
microphones. Movements are realized via an omnidirectional drive
consisting of three modules. Monitoring their motor voltages results
thus in a system with four modalities.
Approaches
we have considered
We
considered several approaches to technically realize a system capable
to develop SMCs. Amongst others we took a closer look at Predictive
State Representations (PSRs) [2], which rely on core tests (a
sequence of actions followed by an observation), to model a dynamical
system. The probability of success for each core test is maintained
and becomes a feature for the representation of the world that can be
used to predict future observations. This intuitive approach seems to
meet the description of SMCs very nicely. However, there are several
unsolved problems. PSRs are basically descriptive and they do not
generate actions (planning problem). Furthermore, the questions of
how to identify core tests (discovery problem) and how to update
their probability distribution (learning problem) are not solved
satisfactorily.
Another
method we examined uses symbolic regression. Schmidt and Lipson used
an evolutionary algorithm to derive physical laws purely based on the
observation of a system [7]. Unfortunately, this approach seems to be
not feasible for an online system.
System
architecture
The
effect of the robot actuators on the external world produces a sensor
reading which in turn can be processed internally by the robot. The
system architecture is composed of strongly interacting modules. The
prediction machineє-greedy
or more sophisticated reinforcement techniques [8]. Once an action
has been chosen and a corresponding action-perception loop has been
carried out, the error between the prediction and actual sensor
reading can be computed and subsequently used to improve the
prediction capabilities of the prediction
machine. Besides that the error
contributes to the overall “well being” of the agent.
generates a prediction of the sensor values of the next time step
associated with an action. For now an action is defined as a
higher-level motor primitive, e.g. “move forward”, but it will be
replaced by a low-level one, i.e. motor voltage, in the long run. The
action selection module can use the prediction to choose a potential
action, e.g. using
This
is realized by relating the inaccuracy of the prediction to novelty
and curiosity (cf. [5, 6]). A high error indicates that the recently
occurred situation is not (well) known. Furthermore, looking at the
rate of change of the error in a sequence of similar
action-perceptions leads to an estimate of either learning progress
or frustration which in turn can be used to influence the action
selection. The “well being” can be
augmented with a variety of in- and external rewards or punishments,
e.g. for low power consumption, boredom or a preferred zone like a
“petting zoo”. From a neuroscience point of view the internal
state of our robot is strongly related to the dopamine cell system of
the midbrain which is involved in the control of movements, the
signalling of (reward) prediction error, motivation and cognition
[3].
Currently
the prediction machine
is realized employing a recurrent neuronal network (RNN). However,
many other methods, e.g. a SVM could be used to learn a relation
between actions and sensor readings. Another approach we consider for
implementing as a basis for the prediction
machine relies on generic sensory
coding principles, e.g. sparse coding, temporal coherence and
predictability. Identically, with respect to a given objective
function, computational units are combined in a hierarchical manner.
Solely based on the statistical properties of the sensory modality
and their position in the hierarchy (and thus differing inputs), the
information processing leads to detection of different invariances
depending on the respective hierarchy level. In fact, Wyss et al.
showed that a (simulated) robot exposed to a continuous visual stream
develops receptive fields with properties matching the ventral visual
system and at the highest level of the hierarchy even resembles
properties similar to place fields observed in the entorhinal cortex
[9] (despite those convincing findings this approach might not be
suitable for a fast online procedure). It remains to be solved which
method and which parameterization is most suitable for our model.
Results
Preliminary
results from a simplified system are promising. Currently, only
infrared sensors are considered and the actions of the robot are
restricted to forward, backward, left, right and stop. Experimenting
with the robot in a simple environment (box of ≈ 1m2,
no obstacles) and narrowing down the possible actions even further
(only forward and backward) leads to a decrease of the prediction
error to satisfactorily low values after a few iterations. However,
when permitting all possible actions, this effect cannot be observed
anymore. Instead, after an initial decrease the error rises again and
the prediction machine
is not able to capture the dynamics of the system.
Discussion
& Outlook
In
contrast to many other approaches, no goal for our robot is
predefined. Instead, judging the error of the prediction of the
sensor readings and the evolution of the resulting behaviour can be
used to monitor the success of the system. Initial results show that
our system struggles when the environmental complexity increases. A
potential reason for this failure could be that the increasing number
of states exceeds the capacity of the RNN. However, a major problem
seems to be the slip of the robot’s wheels leading to a substantial
rotation. By construction, the agent is not able to compensate for
this. Furthermore, it is not able to distinguish between a rotation
of the world and a self-rotation. Thinking about the causes and
consequences of this undesired rotation actually leads us to the
heart of the problem we want to solve. Mastery of SMCs should be a
suitable solution to compensate for the slip and even represent a
generic approach for various kinds of surfaces (of course, for that
the robot must be capable to perform a rotation). Hence, we are
currently focusing on this sub-problem.
In
the long run, we plan to extend our approach with different
modalities. This will help to discriminate between ambiguous
situations and the question of early and late integration can be
addressed. Furthermore, the construction of sensoritopic maps with
e.g. an information theoretic approach could help to investigate the
relation between stimuli of different modalities. An artificial
deprivation or substitution of senses during the course of an
experiment is also conceivable as well as an exchange of sensorimotor
laws between different simulations. Next to the active (the robot
itself) and passive (experimenter) manipulation of the environment,
we can look for object affordances, permanence, recognition as well
as identification.
Connections
to other CINACS projects
This
project is connected to several research projects in China and
Germany within the CINACS framework. In particular, an
interconnection to the project of Mario Maiworm (I.1.5.2) has to be
mentioned. Mario Maiworm utilizes probabilistic models to a) predict
(human) behaviour and b) to integrate sensory parameters by weighing
them by their reliability. The obtained results from his project can
be compared to our robot system in two ways. First, his models can be
used to assess the “human-like” behaviour of our model (which can
be seen as a sort of benchmark). Second, they can be integrated in
our specific framework and (after the parameters have been adjusted)
used to weigh sensory information across different modalities.
Furthermore,
the project of Ning Chen (I 1.3.2), dealing with multimodal data
mining, shares techniques from machine learning which have a common
methodological basis and have led to mutual incitations and
discussions during the 2009 CINACS summer school in Bejing. Although
we do not want to pursue the canaonical approach commonly used in
technical systems (feature extraction followed by early or late
sensory fusion) as it can be found in the project of Ning Chen,
results of the two methods can be compared and some algorithms from
her work, e.g. using Bayesian models for time series analysis, can be
valuable for the interpretation of our approach.
Since
this project is a continuation of Tobias Kringe’s project
(I.1.1.2.), it shares the same relations with that project (see
description there).
References
[1]
H. v. Helmholtz. Handbuch der
physiologischen Optik, Voss:
Leipzig 1867.
[2]
L. Michael, S.
Richard and S.
Satinder. Predictive representations of
state. In:
Advances In Neural Information Processing Systems 14, pp.
1555-1561. MIT
Press: Cambridge,
MA 2002.
[3]
M. Masayuki and
H. Okihide. Two
types of dopamine neuron distinctly convey positive
and negative motivational signals. Nature,
459: 837-841,
2009.
[4]
J K O'Regan and A Noe. A sensorimotor account of vision and visual
consciousness. Behav Brain Sci, 24:
939-973,
2001.
[5]
P. Y.
Oudeyer, F. Kaplan, et al. Intrinsic motivation systems for
autonomous mental development. IEEE Transactions on Evolutionary
Computation, 11:
265—286, 2007.
[6]
J. Schmidhuber. Curious Model-Building
Control-Systems. Proc. IJCNN
1991 - IEEE
International Joint Conference on Neural Networks, 2:
1458-1463, Singapor,
Nov 18-21, 1991.
[7]
S. Michael and L.
Hod. Distilling free-form natural laws from
experimental data. Science, 324: 81-85,
2009.
[8]
R. Sutton
and A. Barto. Reinforcement Learning: An
Introduction. MIT Press:
Cambridge 1998.
[9]
R. Wyss, P. König, et al. A model of
the ventral visual system based on temporal stability and local
memory. PLoS Biol,
4: e120, 2006.
|