Gated recurrent unit

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al.^[1] Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory (LSTM). However, GRUs have been shown to exhibit better performance on smaller datasets.^[2]

They have fewer parameters than LSTM, as they lack an output gate.^[3]

Architecture

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.

The operator $\circ$ denotes the Hadamard product in the following.

Fully gated unit

Gated Recurrent Unit, fully gated version

Initially, for $t=0$ , the output vector is $h_{0}=0$ .

{\begin{aligned}z_{t}&=\sigma _{g}(W_{z}x_{t}+U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma _{g}(W_{r}x_{t}+U_{r}h_{t-1}+b_{r})\\h_{t}&=(1-z_{t})\circ h_{t-1}+z_{t}\circ \sigma _{h}(W_{h}x_{t}+U_{h}(r_{t}\circ h_{t-1})+b_{h})\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
$z_{t}$ : update gate vector
$r_{t}$ : reset gate vector
$W$ , $U$ and $b$ : parameter matrices and vector

Activation functions

$\sigma _{g}$ : The original is a sigmoid function.
$\sigma _{h}$ : The original is a hyperbolic tangent.

Alternative activation functions are possible, provided that $\sigma _{g}(x)\in [0,1]$ .

Type 1

Type 2

Type 3

Alternate forms can be created by changing $z_{t}$ and $r_{t}$ ^[4]

Type 1, each gate depends only on the previous hidden state and the bias.
${\begin{aligned}z_{t}&=\sigma _{g}(U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma _{g}(U_{r}h_{t-1}+b_{r})\\\end{aligned}}$
Type 2, each gate depends only on the previous hidden state.
${\begin{aligned}z_{t}&=\sigma _{g}(U_{z}h_{t-1})\\r_{t}&=\sigma _{g}(U_{r}h_{t-1})\\\end{aligned}}$
Type 3, each gate is computed using only the bias.
${\begin{aligned}z_{t}&=\sigma _{g}(b_{z})\\r_{t}&=\sigma _{g}(b_{r})\\\end{aligned}}$

Minimal gated unit

The minimal gated unit is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed ^[5]

{\begin{aligned}f_{t}&=\sigma _{g}(W_{f}x_{t}+U_{f}h_{t-1}+b_{f})\\h_{t}&=f_{t}\circ h_{t-1}+(1-f_{t})\circ \sigma _{h}(W_{h}x_{t}+U_{h}(f_{t}\circ h_{t-1})+b_{h})\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
$f_{t}$ : forget vector
$W$ , $U$ and $b$ : parameter matrices and vector

References

↑ Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078 [cs.CL].
↑ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
↑ "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. Retrieved May 18, 2016.
↑ Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].
↑ Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078 [cs.CL].

[MyUser_Arxiv.org_May_18_2016c-2] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[MyUser_Wildml.com_May_18_2016c-3] "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. Retrieved May 18, 2016.

[4] Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].

[5] Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].

Gated recurrent unit

Architecture

Fully gated unit

Minimal gated unit

See also

References