Incorporating End-to-End Speech Recognition Models for Sentiment Analysis
2019 IEEE International Conference on Robotics and Automation (ICRA),
pages 7976--7982,
doi: 10.1109/ICRA.2019.8794468
- May 2019
Previous work on emotion recognition demonstrated a synergistic effect of combining several modalities such
as auditory, visual, and transcribed text to estimate the affective
state of a speaker. Among these, the linguistic modality is crucial
for the evaluation of an expressed emotion. However, manually
transcribed spoken text cannot be given as input to a system
practically. We argue that using ground-truth transcriptions
during training and evaluation phases leads to a significant
discrepancy in performance compared to real-world conditions,
as the spoken text has to be recognized on the fly and can
contain speech recognition mistakes. In this paper, we propose
a method of integrating an automatic speech recognition (ASR)
output with a character-level recurrent neural network for sentiment recognition. In addition, we conduct several experiments
investigating sentiment recognition for human-robot interaction
in a noise-realistic scenario which is challenging for the ASR
systems. We quantify the improvement compared to using only
the acoustic modality in sentiment recognition. We demonstrate
the effectiveness of this approach on the Multimodal Corpus
of Sentiment Intensity (MOSI) by achieving 73,6% accuracy
in a binary sentiment classification task, exceeding previously
reported results that use only acoustic input. In addition, we
set a new state-of-the-art performance on the MOSI dataset
(80.4% accuracy, 2% absolute improvement).
@InProceedings{LZWMW19, author = {Lakomkin, Egor and Zamani, Mohammad Ali and Weber, Cornelius and Magg, Sven and Wermter, Stefan}, title = {Incorporating End-to-End Speech Recognition Models for Sentiment Analysis}, booktitle = {2019 IEEE International Conference on Robotics and Automation (ICRA)}, editors = {}, number = {}, volume = {}, pages = {7976--7982}, year = {2019}, month = {May}, publisher = {}, doi = {10.1109/ICRA.2019.8794468}, }