EmoRL: Real-time Acoustic Emotion Classification using Deep Reinforcement Learning
Proceedings of the International Conference on Robotics and Automation (ICRA),
pages 4445--4450,
doi: 10.1109/ICRA.2018.8461058
- May 2018
Acoustically expressed emotions can make communication with a robot more efficient. Detecting emotions
like anger could provide a clue for the robot indicating
unsafe/undesired situations. Recently, several deep neural
network-based models have been proposed which establish
new state-of-the-art results in affective state evaluation. These
models typically start processing at the end of each utterance,
which not only requires a mechanism to detect the end of an
utterance but also makes it difficult to use them in a real-time
communication scenario, e.g. human-robot interaction. We propose the EmoRL model that triggers an emotion classification as
soon as it gains enough confidence while listening to a person
speaking. As a result, we minimize the need for segmenting
the audio signal for classification and achieve lower latency
as the audio signal is processed incrementally. The method is
competitive with the accuracy of a strong baseline model, while
allowing much earlier prediction.
@InProceedings{LZWMW18, author = {Lakomkin, Egor and Zamani, Mohammad Ali and Weber, Cornelius and Magg, Sven and Wermter, Stefan}, title = {EmoRL: Real-time Acoustic Emotion Classification using Deep Reinforcement Learning}, booktitle = {Proceedings of the International Conference on Robotics and Automation (ICRA)}, editors = {}, number = {}, volume = {}, pages = {4445--4450}, year = {2018}, month = {May}, publisher = {}, doi = {10.1109/ICRA.2018.8461058}, }