Learning Sparse Hidden States in Long Short-Term Memory
Artificial Neural Networks and Machine Learning – ICANN 2019,
doi: ARXIV:1709.05027
- Sep 2019
Long Short-Term Memory (LSTM) is a powerful recurrent
neural network architecture that is successfully used in many sequence
modeling applications. Inside an LSTM unit, a vector called memory
cell is used to memorize the history. Another important vector, which
works along with the memory cell, represents hidden states and is used
to make a prediction at a specific step. Memory cells record the entire
history, while the hidden states at a specific time step in general need to
attend only to very limited information thereof. Therefore, there exists
an imbalance between the huge information carried by a memory cell
and the small amount of information requested by the hidden states at
a specific step. We propose to explicitly impose sparsity on the hidden
states to adapt them to the required information. Extensive experiments
show that sparsity reduces the computational complexity and improves
the performance of LSTM networks.
@InProceedings{YWH19, author = {Yu, Niange and Weber, Cornelius and Hu, Xiaolin}, title = {Learning Sparse Hidden States in Long Short-Term Memory}, booktitle = {Artificial Neural Networks and Machine Learning – ICANN 2019}, editors = {}, number = {}, volume = {}, pages = {}, year = {2019}, month = {Sep}, publisher = {}, doi = {ARXIV:1709.05027}, }