Understanding auditory representations of emotional expressions with neural networks
Neural Computing and Applications,
Volume 32,
Number 4,
pages 1007–1022,
doi: 10.1007/s00521-018-3869-3
- Feb 2020
<p>
In contrast to many established emotion recognition systems, convolutional neural networks do not rely on handcrafted
features to categorize emotions. Although achieving state-of-the-art performances, it is still not fully understood what these
networks learn and how the learned representations correlate with the emotional characteristics of speech. The aim of this
work is to contribute to a deeper understanding of the acoustic and prosodic features that are relevant for the perception of
emotional states. Firstly, an artificial deep neural network architecture is proposed that learns the auditory features directly
from the raw and unprocessed speech signal. Secondly, we introduce two novel methods for the analysis of the implicitly
learned representations based on data-driven and network-driven visualization techniques. Using these methods, we
identify how the network categorizes an audio signal as a two-dimensional representation of emotions, namely valence and
arousal. The proposed approach is a general method to enable a deeper analysis and understanding of the most relevant
representations to perceive emotional expressions in speech.
</p>
@Article{WBHW20, author = {Wieser, Iris and Barros, Pablo and Heinrich, Stefan and Wermter, Stefan}, title = {Understanding auditory representations of emotional expressions with neural networks}, journal = {Neural Computing and Applications}, number = {4}, volume = {32}, pages = {1007–1022}, year = {2020}, month = {Feb}, publisher = {Springer London}, doi = {10.1007/s00521-018-3869-3}, }