Automatically augmenting an emotion dataset improves classification using audio
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics,
Volume 2,
pages 194--197,
doi: ARXIV:1803.11506
- Apr 2017
In this work, we tackle a problem of
speech emotion classification. One of the
issues in the area of affective computation is that the amount of annotated data
is very limited. On the other hand, the
number of ways that the same emotion
can be expressed verbally is enormous due
to variability between speakers. This is
one of the factors that limits performance
and generalization. We propose a simple
method that extracts audio samples from
movies using textual sentiment analysis.
As a result, it is possible to automatically
construct a larger dataset of audio samples
with positive, negative emotional and neutral speech. We show that pretraining recurrent neural network on such a dataset
yields better results on the challenging
EmotiW corpus. This experiment shows a
potential benefit of combining textual sentiment analysis with vocal information.

@InProceedings{LWW17, author = {Lakomkin, Egor and Weber, Cornelius and Wermter, Stefan}, title = {Automatically augmenting an emotion dataset improves classification using audio}, booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics}, editors = {}, number = {}, volume = {2}, pages = {194--197}, year = {2017}, month = {Apr}, publisher = {}, doi = {ARXIV:1803.11506}, }