Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics
The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019),
doi: 10.21437/Interspeech.2019-2845
- Sep 2019
Unsupervised learning represents an important opportunity for
obtaining useful speech representations. Recently, variational
autoencoders (VAEs) have been shown to extract useful representations in an unsupervised manner. These models are usually
not designed to explicitly disentangle specific sources of information. When processing data of sequential nature which involves multi-timescale information, disentanglement can however be beneficial. In this paper we address this issue by developing a predictive auxiliary variational autoencoder to obtain
speech representations at different timescales. We will present
an auxiliary lower bound which is used to develop a model that
we call the Predictive Aux-VAE. The model is designed to disentangle global from local information into a dedicated auxiliary variable. Learned representations are analysed with respect to their ability to capture global speech characteristics.
We observe that representations of individual speakers are separated well in the latent space and can successfully be used in a
subsequent speaker identification task where they achieve high
classification accuracy, comparable to a fully supervised model.
Moreover, manipulating the global variable allows to change
global characteristics while retaining the local content during
generation which demonstrates the success of our model to disentangle global from local information.
@InProceedings{SLWW19, author = {Springenberg, Sebastian and Lakomkin, Egor and Weber, Cornelius and Wermter, Stefan}, title = {Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics}, booktitle = {The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)}, editors = {}, number = {}, volume = {}, pages = {}, year = {2019}, month = {Sep}, publisher = {IEEE}, doi = {10.21437/Interspeech.2019-2845}, }