Highway network

In machine learning, a highway network is an approach to optimizing networks and increasing their depth. Highway networks use learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks. The gating mechanisms allow neural networks to have paths for information to follow across different layers ("information highways").[1][2]

Highway networks have been used as part of text sequence labeling and speech recognition tasks.[3][4]

Model

The model has two gates in addition to the H(W_H, x) gate: the transform gate T(W_T, x) and the carry gate C(W_C, x). Those two last gates are non-linear transfer functions (by convention Sigmoid function). The H(W_H, x) function can be any desired transfer function.

The carry gate is defined as C(W_C, x) = 1 - T(W_T, x). While the transform gate is just a gate with a sigmoid transfer function.

Structure

The structure of a hidden layer follows the equation:

${\begin{aligned}y=H(x,W_{H})\centerdot T(x,W_{T})+x\centerdot C(x,W_{C})=H(x,W_{H})\centerdot T(x,W_{T})+x\centerdot (1-T(x,W_{T}))\end{aligned}}$

The advantage of a Highway Network over the common deep neural networks is that solves or prevents partially the Vanishing gradient problem, thus leading to easier to optimize neural networks.

References

Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems 28. Curran Associates, Inc.: 2377–2385.
Liu, Liyuan; Shang, Jingbo; Xu, Frank F.; Ren, Xiang; Gui, Huan; Peng, Jian; Han, Jiawei (12 September 2017). "Empower Sequence Labeling with Task-Aware Neural Language Model". arXiv:1709.04109 [cs.CL].
Kurata, Gakuto; Ramabhadran, Bhuvana; Saon, George; Sethy, Abhinav (19 September 2017). "Language Modeling with Highway LSTM". arXiv:1709.06436 [cs.CL].

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].

[2] Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems 28. Curran Associates, Inc.: 2377–2385.

[3] Liu, Liyuan; Shang, Jingbo; Xu, Frank F.; Ren, Xiang; Gui, Huan; Peng, Jian; Han, Jiawei (12 September 2017). "Empower Sequence Labeling with Task-Aware Neural Language Model". arXiv:1709.04109 [cs.CL].

[4] Kurata, Gakuto; Ramabhadran, Bhuvana; Saon, George; Sethy, Abhinav (19 September 2017). "Language Modeling with Highway LSTM". arXiv:1709.06436 [cs.CL].