Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
Proceedings of the International Joint Conference on Neural Networks (IJCNN 2021),
- Jul 2021
In this work, we propose a novel training scheme
to modularize end-to-end systems. Our training scheme aims
at altering the flow of information in an end-to-end system to
use the kernels of this system for another system that fulfills
another task. We apply this scheme to extract the noise reduction
capabilities from a noise-robust automatic speech recognition
(ASR) system and implement a speech enhancer from it. This
enhancer receives spectral representations from unfiltered audio
and outputs cleaned spectral representations. Our enhancer can
be integrated into an ASR system as front-end, is trainable, and
reduces background noise. Our front-end uses a decoder to clean
speech based on the hidden activations of the ASR system Jasper.
While training, we exclusively adapt the weights in our decoder
and the batch normalization in Jasper. The resulting spectral
representations show less background noise. Further, areas in the
spectral features are not reconstructed if they do not contribute
to speech recognition. We demonstrate that our front-end can
be combined with a pre-trained ASR system as back-end and
supports speech recognition in noisy conditions. Further, we show
that training another ASR system with our front-end results in
an increased performance of the ASR system in noisy as well as
noiseless conditions. The ASR system's performance is especially
improved on challenging speech datasets.
@InProceedings{MTWW21, author = {Möller, Matthias and Twiefel, Johannes and Weber, Cornelius and Wermter, Stefan}, title = {Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems}, booktitle = {Proceedings of the International Joint Conference on Neural Networks (IJCNN 2021)}, editors = {}, number = {}, volume = {}, pages = {}, year = {2021}, month = {Jul}, publisher = {}, doi = {}, }