Highway network

In machine learning, a highway network is an approach to optimizing networks and increasing their depth. Highway networks use learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks. The gating mechanisms allow neural networks to have paths for information to follow across different layers ("information highways").[1][2]

Highway networks have been used as part of text sequence labeling and speech recognition tasks.[3][4]


Model

The model has two gates in addition to the H(WH, x) gate: the transform gate T(WT, x) and the carry gate C(WC, x). Those two last gates are non-linear transfer functions (by convention Sigmoid function). The H(WH, x) function can be any desired transfer function.

The carry gate is defined as C(WC, x) = 1 - T(WT, x). While the transform gate is just a gate with a sigmoid transfer function.


Structure

The structure of a hidden layer follows the equation:


The advantage of a Highway Network over the common deep neural networks is that solves or prevents partially the Vanishing gradient problem, thus leading to easier to optimize neural networks.


References

  1. Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
  2. Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems 28. Curran Associates, Inc.: 2377–2385.
  3. Liu, Liyuan; Shang, Jingbo; Xu, Frank F.; Ren, Xiang; Gui, Huan; Peng, Jian; Han, Jiawei (12 September 2017). "Empower Sequence Labeling with Task-Aware Neural Language Model". arXiv:1709.04109 [cs.CL].
  4. Kurata, Gakuto; Ramabhadran, Bhuvana; Saon, George; Sethy, Abhinav (19 September 2017). "Language Modeling with Highway LSTM". arXiv:1709.06436 [cs.CL].


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.