The Effects on Adaptive Behaviour of Negatively Valenced Signals in Reinforcement Learning

Nicolás Navarro-Guerrero , Robert Lowe , Stefan Wermter

Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EPIROB), Volume 7, pages 148-155, doi: 10.1109/DEVLRN.2017.8329800 - Sep 2017

Associated documents :

Reinforcement learning algorithms and particularly those based on temporal-difference learning are widely adopted and have been successfully applied to a number of problems as well as used to model animal learning. However, they are based on neural pathways involved in reward-seeking behaviour since little is known about punishment-driven learning and less still about the combined effects of both types of reinforcement on learning. This may not only be a shortcoming for computational models of human and animal learning but we have recently shown that it may also carry detrimental effects for machine learning applications, with respect to task performance and convergence speed. Here, we further explore our original results and compare the effects of different functions, i.e. binary, linear, exponential with different variance, for punishment on learning. Our experiments confirm the original finding of punishment signals reducing learning speed. It appears this result generalizes across a number of different functions of punishment reinforcement.

@InProceedings{NLW17a, 
 	 author =  {Navarro-Guerrero, Nicolás and Lowe, Robert and Wermter, Stefan},  
 	 title = {The Effects on Adaptive Behaviour of Negatively Valenced Signals in Reinforcement Learning}, 
 	 booktitle = {Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EPIROB)},
 	 editors = {},
 	 number = {},
 	 volume = {7},
 	 pages = {148-155},
 	 year = {2017},
 	 month = {Sep},
 	 publisher = {IEEE},
 	 doi = {10.1109/DEVLRN.2017.8329800}, 
 }