Toward a goal-directed construction of state spaces
Toward a goal directed construction of state spaces
Sohrab Saeb *(1), Cornelius Weber (1)
(1) Frankfurt Institute for Advanced Studies, Frankfurt, Germany
* saeb@fias.uni-frankfurt.de
Reinforcement learning of complex tasks presents at least two major problems. The first problem is caused by the presence of sensory data that are irrelevant to the task. It will be a waste of computational resources if an intelligent system represents information that are irrelevant, since in such a case state spaces will be of high dimensionality and learning will become too slow. Therefore, it is important to represent only the relevant data. Unsupervised learning methods such as independent component analysis can be used to encode the state space [1]. While these methods are able to separate sources of relevant and irrelevant information in certain conditions, nevertheless all data are represented.
The second problem arises when information about the environment is incomplete as in socalled partially observable Markov decision processes. This leads to the perceptual aliasing problem, where different world states appear the same to the agent even though different decisions have to be made in each of them. To overcome this problem, one should constantly estimate the current state based also on previous information. This estimation process is traditionally performed using Bayesian estimation approaches such as Kalman filters and hidden Markov models [2].
The above-mentioned methods for solving these two problems are merely based on the statistics of sensory data without considering any goal-directed behaviour. Recent findings from biology suggest an influence of the dopaminergic system on even early sensory representations, which indicates a strong task influence [3,4]. Our goal is to model such effects in a reinforcement learning approach.
Standard reinforcement learning methods often involve a pre-defined state space. In this study, we extend the traditional reinforcement learning methodology by incorporating a feature detection stage and a predictive network, which together define the state space of the agent. The predictive network learns to predict the current state based on the previous state and the previously chosen action, i.e. it becomes a forward model. We present a temporal difference based learning rule for training the weight parameters of these additional network components. The simulation results show that the performance of the network is maintained both, in the presence of task-irrelevant features, and in the case of a non-Markovian environment, where the input is invisible at randomly occurring time steps.
The model presents a link between reinforcement learning, feature detection and predictive networks and may help to explain how the dopaminergic system recruits cortical circuits for goal-directed feature detection and prediction.
References:
[1] Independent component analysis: a new concept? P. Comon. Signal Processing, 36(3):287-314 (1994).
[2] Planning and acting in partially observable stochastic domains. L. P. Kaelbling, M. L. Littman and A. R. Cassandra. Artificial Intelligence, 101:99-134 (1995).
[3] Practising orientation identification improves orientation coding in V1 neurons. A. Schoups, R. Vogels, N. Qian and G. Orban. (2001). Nature, 412: 549-53 (2001).
[4] Reward-dependent modulation of working memory in lateral prefrontal cortex. S. W. Kennerley, and J. D. Wallis. J. Neurosci, 29(10): 3259-70 (2009).
@InProceedings{SW09, author = {Saeb, Sohrab and Weber, Cornelius}, title = {Toward a goal-directed construction of state spaces}, booktitle = {Proc. Bernstein Conference on Computational Neuroscience}, editors = {}, number = {}, volume = {}, pages = {}, year = {2009}, month = {}, publisher = {}, doi = {}, }