Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics

The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019), doi: 10.21437/Interspeech.2019-2845 - Sep 2019 Open Access
Associated documents :  
Unsupervised learning represents an important opportunity for obtaining useful speech representations. Recently, variational autoencoders (VAEs) have been shown to extract useful representations in an unsupervised manner. These models are usually not designed to explicitly disentangle specific sources of information. When processing data of sequential nature which involves multi-timescale information, disentanglement can however be beneficial. In this paper we address this issue by developing a predictive auxiliary variational autoencoder to obtain speech representations at different timescales. We will present an auxiliary lower bound which is used to develop a model that we call the Predictive Aux-VAE. The model is designed to disentangle global from local information into a dedicated auxiliary variable. Learned representations are analysed with respect to their ability to capture global speech characteristics. We observe that representations of individual speakers are separated well in the latent space and can successfully be used in a subsequent speaker identification task where they achieve high classification accuracy, comparable to a fully supervised model. Moreover, manipulating the global variable allows to change global characteristics while retaining the local content during generation which demonstrates the success of our model to disentangle global from local information.

 

@InProceedings{SLWW19, 
 	 author =  {Springenberg, Sebastian and Lakomkin, Egor and Weber, Cornelius and Wermter, Stefan},  
 	 title = {Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics}, 
 	 booktitle = {The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)},
 	 editors = {},
 	 number = {},
 	 volume = {},
 	 pages = {},
 	 year = {2019},
 	 month = {Sep},
 	 publisher = {IEEE},
 	 doi = {10.21437/Interspeech.2019-2845}, 
 }