On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks
Proceedings of the International Conference on Intelligent Robots,
pages 854--860,
doi: 10.1109/IROS.2018.8593571
- Oct 2018
Speech emotion recognition (SER) is an important
aspect of effective human-robot collaboration and received a lot
of attention from the research community. For example, many
neural network-based architectures were proposed recently
and pushed the performance to a new level. However, the
applicability of such neural SER models trained only on indomain data to noisy conditions is currently under-researched.
In this work, we evaluate the robustness of state-of-the-art
neural acoustic emotion recognition models in human-robot
interaction scenarios. We hypothesize that a robots ego noise,
room conditions, and various acoustic events that can occur in a
home environment can significantly affect the performance of a
model. We conduct several experiments on the iCub robot platform and propose several novel ways to reduce the gap between
the models performance during training and testing in realworld conditions. Furthermore, we observe large improvements
in the model performance on the robot and demonstrate the
necessity of introducing several data augmentation techniques
like overlaying background noise and loudness variations to
improve the robustness of the neural approaches.
@Article{LZWMW18a, author = {Lakomkin, Egor and Zamani, Mohammad Ali and Weber, Cornelius and Magg, Sven and Wermter, Stefan}, title = {On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks}, journal = {Proceedings of the International Conference on Intelligent Robots}, number = {}, volume = {}, pages = {854--860}, year = {2018}, month = {Oct}, publisher = {IEEE}, doi = {10.1109/IROS.2018.8593571}, }