Recurrent Neural Networks (RNN) are an alternative to feedforward networks.
These have loops.
We have inputs which are not independent. For example speech input, where each input is a the recording for a length of time.
The activation unit takes the input, and an outcome from the previous activation unit. It then performs its activation function.
This allows information to be kept across time.
However this degrades, and relevant information was from much earlier, it will be lost.
We can do backpropagation on the unrolled network, backpropagating over time.
These are a more complex RNN architecture.
Each cell has as an as input the cell state from the previous cell \(C_{t-1}\)
The LSTM cell updates the cell state to \(C_{t}\) and pushes it to the next cell.
We have \(x_t\), the input of the cell, and \(h_{t-1}\), the output of the previous cell.
We run an activation function on the cell state \(C_t\) to get a candidate output.
We multiply this by the outcome of the output gate to get the actual result.
We create a candidate change to the state.
We multiply this by the input gate value, and add it to the state.
This is a multiplication factor. What % of the state should be removed?