Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)

Simple recurrent neural networks

Simple Recurrent Neural Networks (RNNs)

Introduction

Recurrent Neural Networks (RNN) are an alternative to feedforward networks.

These have loops.

Motivation

We have inputs which are not independent. For example speech input, where each input is a the recording for a length of time.

Unrolling RNNs

The activation unit takes the input, and an outcome from the previous activation unit. It then performs its activation function.

This allows information to be kept across time.

However this degrades, and relevant information was from much earlier, it will be lost.

Backpropagation Through Time (BPTT)

We can do backpropagation on the unrolled network, backpropagating over time.

Long Short-Term Memory (LSTM)

Long Short-Term Mmemory (LSTM)

Introduction

These are a more complex RNN architecture.

Cell state

Each cell has as an as input the cell state from the previous cell \(C_{t-1}\)

The LSTM cell updates the cell state to \(C_{t}\) and pushes it to the next cell.

Other inputs to the cell

We have \(x_t\), the input of the cell, and \(h_{t-1}\), the output of the previous cell.

Cell output and the output gate

We run an activation function on the cell state \(C_t\) to get a candidate output.

We multiply this by the outcome of the output gate to get the actual result.

The input gate

We create a candidate change to the state.

We multiply this by the input gate value, and add it to the state.

The forget gate

This is a multiplication factor. What % of the state should be removed?

Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)

Simple recurrent neural networks

Simple Recurrent Neural Networks (RNNs)

Introduction

Motivation

Unrolling RNNs

Backpropagation Through Time (BPTT)

Long Short-Term Memory (LSTM)

Long Short-Term Mmemory (LSTM)

Introduction

Cell state

Other inputs to the cell

Cell output and the output gate

The input gate

The forget gate

Variants

Peephole LTSM

Gated Recurrent Units (GRUs)

Forecasting with recurrant neural networks

Introduction

Other

Attention and Neural Turing Machines