Embodied Language Learning with Paired Variational Autoencoders

Ozan Özdemir , Matthias Kerzel , Stefan Wermter

2021 IEEE International Conference on Development and Learning (ICDL), pages 1--6, doi: 10.1109/ICDL49984.2021.9515668 - Aug 2021

Associated documents :

<p>Language acquisition is an integral part of developmental robotics, which aims at understanding the key components in human development and learning to utilise them in artificial agents. Similar to human infants, robots can learn language while interacting with objects in their environments and receiving linguistic input. This process, also coined as embodied language learning, can enhance language acquisition in robots via multiple modalities including visual and sensorimotor input. In this work, we explore ways to translate a simple action in a tabletop environment into various linguistic commands based on an existing approach which exploits the idea of multiple autoencoders. While the existing approach focuses on strict one-to-one mappings between actions and descriptions by implicitly binding two standard autoencoders in the latent space, we propose a variational autoencoder model to facilitate one-to-many mapping between actions and descriptions. Additionally, for extracting visual features, we employ channel-separated convolutional autoencoders to better handle complex visual input. The results show that our model outperforms the existing approach in associating multiple commands with the corresponding action.</p><b><a href="https://youtu.be/b65KyzLtqKQ" target="_blank">Youtube Video</a>

@InProceedings{OKW21, 
 	 author =  {Özdemir, Ozan and Kerzel, Matthias and Wermter, Stefan},  
 	 title = {Embodied Language Learning with Paired Variational Autoencoders}, 
 	 booktitle = {2021 IEEE International Conference on Development and Learning (ICDL)},
 	 editors = {},
 	 number = {},
 	 volume = {},
 	 pages = {1--6},
 	 year = {2021},
 	 month = {Aug},
 	 publisher = {IEEE},
 	 doi = {10.1109/ICDL49984.2021.9515668}, 
 }