Long short-term memory

Long short-term memory (LSTM)^[1] is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem^[2] commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models, and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps (thus "long short-term memory").^[1] The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since the early 20th century.

An LSTM unit is typically composed of a cell and three gates: an input gate, an output gate,^[3] and a forget gate.^[4] The cell remembers values over arbitrary time intervals, and the gates regulate the flow of information into and out of the cell. Forget gates decide what information to discard from the previous state, by mapping the previous state and the current input to a value between 0 and 1. A (rounded) value of 1 signifies retention of the information, and a value of 0 represents discarding. Input gates decide which pieces of new information to store in the current cell state, using the same system as forget gates. Output gates control which pieces of information in the current cell state to output, by assigning a value from 0 to 1 to the information, considering the previous and current states. Selectively outputting relevant information from the current state allows the LSTM network to maintain useful, long-term dependencies to make predictions, both in current and future time-steps.

LSTM has wide applications in classification,^[5]^[6] data processing, time series analysis tasks,^[7] speech recognition,^[8]^[9] machine translation,^[10]^[11] speech activity detection,^[12] robot control,^[13]^[14] video games,^[15]^[16] and healthcare.^[17]

^ ^a ^b Cite error: The named reference lstm1997 was invoked but never defined (see the help page).
^ Cite error: The named reference hochreiter1991 was invoked but never defined (see the help page).
^ Hochreiter, Sepp; Schmidhuber, Jürgen (1996-12-03). "LSTM can solve hard long time lag problems". Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS'96. Cambridge, MA, USA: MIT Press: 473–479.
^ Cite error: The named reference lstm2000 was invoked but never defined (see the help page).
^ Cite error: The named reference graves2006 was invoked but never defined (see the help page).
^ Karim, Fazle; Majumdar, Somshubra; Darabi, Houshang; Chen, Shun (2018). "LSTM Fully Convolutional Networks for Time Series Classification". IEEE Access. 6: 1662–1669. arXiv:1709.05206. doi:10.1109/ACCESS.2017.2779939. ISSN 2169-3536.
^ Cite error: The named reference wierstra2005 was invoked but never defined (see the help page).
^ Sak, Hasim; Senior, Andrew; Beaufays, Francoise (2014). "Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF). Archived from the original (PDF) on 2018-04-24.
^ Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL].
^ Wu, Yonghui; Schuster, Mike; Chen, Zhifeng; Le, Quoc V.; Norouzi, Mohammad; Macherey, Wolfgang; Krikun, Maxim; Cao, Yuan; Gao, Qin (2016-09-26). "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". arXiv:1609.08144 [cs.CL].
^ Ong, Thuy (4 August 2017). "Facebook's translations are now powered completely by AI". www.allthingsdistributed.com. Retrieved 2019-02-15.
^ Sahidullah, Md; Patino, Jose; Cornell, Samuele; Yin, Ruiking; Sivasankaran, Sunit; Bredin, Herve; Korshunov, Pavel; Brutti, Alessio; Serizel, Romain; Vincent, Emmanuel; Evans, Nicholas; Marcel, Sebastien; Squartini, Stefano; Barras, Claude (2019-11-06). "The Speed Submission to DIHARD II: Contributions & Lessons Learned". arXiv:1911.02388 [eess.AS].
^ Cite error: The named reference mayer2006 was invoked but never defined (see the help page).
^ "Learning Dexterity". OpenAI. July 30, 2018. Retrieved 2023-06-28.
^ Rodriguez, Jesus (July 2, 2018). "The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI". Towards Data Science. Archived from the original on 2019-12-26. Retrieved 2019-01-15.
^ Stanford, Stacy (January 25, 2019). "DeepMind's AI, AlphaStar Showcases Significant Progress Towards AGI". Medium ML Memoirs. Retrieved 2019-01-15.
^ Schmidhuber, Jürgen (2021). "The 2010s: Our Decade of Deep Learning / Outlook on the 2020s". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.

[lstm1997-1] Cite error: The named reference lstm1997 was invoked but never defined (see the help page).

[hochreiter1991-2] Cite error: The named reference hochreiter1991 was invoked but never defined (see the help page).

[hochreiter1996-3] Hochreiter, Sepp; Schmidhuber, Jürgen (1996-12-03). "LSTM can solve hard long time lag problems". Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS'96. Cambridge, MA, USA: MIT Press: 473–479.

[lstm2000-4] Cite error: The named reference lstm2000 was invoked but never defined (see the help page).

[graves2006-5] Cite error: The named reference graves2006 was invoked but never defined (see the help page).

[6] Karim, Fazle; Majumdar, Somshubra; Darabi, Houshang; Chen, Shun (2018). "LSTM Fully Convolutional Networks for Time Series Classification". IEEE Access. 6: 1662–1669. arXiv:1709.05206. doi:10.1109/ACCESS.2017.2779939. ISSN 2169-3536.

[wierstra2005-7] Cite error: The named reference wierstra2005 was invoked but never defined (see the help page).

[sak2014-8] Sak, Hasim; Senior, Andrew; Beaufays, Francoise (2014). "Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF). Archived from the original (PDF) on 2018-04-24.

[liwu2015-9] Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL].

[GoogleTranslate-10] Wu, Yonghui; Schuster, Mike; Chen, Zhifeng; Le, Quoc V.; Norouzi, Mohammad; Macherey, Wolfgang; Krikun, Maxim; Cao, Yuan; Gao, Qin (2016-09-26). "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". arXiv:1609.08144 [cs.CL].

[FacebookTranslate-11] Ong, Thuy (4 August 2017). "Facebook's translations are now powered completely by AI". www.allthingsdistributed.com. Retrieved 2019-02-15.

[12] Sahidullah, Md; Patino, Jose; Cornell, Samuele; Yin, Ruiking; Sivasankaran, Sunit; Bredin, Herve; Korshunov, Pavel; Brutti, Alessio; Serizel, Romain; Vincent, Emmanuel; Evans, Nicholas; Marcel, Sebastien; Squartini, Stefano; Barras, Claude (2019-11-06). "The Speed Submission to DIHARD II: Contributions & Lessons Learned". arXiv:1911.02388 [eess.AS].

[mayer2006-13] Cite error: The named reference mayer2006 was invoked but never defined (see the help page).

[OpenAIhand-14] "Learning Dexterity". OpenAI. July 30, 2018. Retrieved 2023-06-28.

[OpenAIfive-15] Rodriguez, Jesus (July 2, 2018). "The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI". Towards Data Science. Archived from the original on 2019-12-26. Retrieved 2019-01-15.

[alphastar-16] Stanford, Stacy (January 25, 2019). "DeepMind's AI, AlphaStar Showcases Significant Progress Towards AGI". Medium ML Memoirs. Retrieved 2019-01-15.

[decade2022-17] Schmidhuber, Jürgen (2021). "The 2010s: Our Decade of Deep Learning / Outlook on the 2020s". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

Long short-term memory

From Wikipedia, the free encyclopedia · View on Wikipedia