Learning Sparse Hidden States in Long Short-Term Memory

Niange Yu , Cornelius Weber , Xiaolin Hu

Artificial Neural Networks and Machine Learning – ICANN 2019, doi: ARXIV:1709.05027 - Sep 2019 Open Access

Associated documents :

Long Short-Term Memory (LSTM) is a powerful recurrent neural network architecture that is successfully used in many sequence modeling applications. Inside an LSTM unit, a vector called memory cell is used to memorize the history. Another important vector, which works along with the memory cell, represents hidden states and is used to make a prediction at a specific step. Memory cells record the entire history, while the hidden states at a specific time step in general need to attend only to very limited information thereof. Therefore, there exists an imbalance between the huge information carried by a memory cell and the small amount of information requested by the hidden states at a specific step. We propose to explicitly impose sparsity on the hidden states to adapt them to the required information. Extensive experiments show that sparsity reduces the computational complexity and improves the performance of LSTM networks.

@InProceedings{YWH19, 
 	 author =  {Yu, Niange and Weber, Cornelius and Hu, Xiaolin},  
 	 title = {Learning Sparse Hidden States in Long Short-Term Memory}, 
 	 booktitle = {Artificial Neural Networks and Machine Learning – ICANN 2019},
 	 editors = {},
 	 number = {},
 	 volume = {},
 	 pages = {},
 	 year = {2019},
 	 month = {Sep},
 	 publisher = {},
 	 doi = {ARXIV:1709.05027}, 
 }