We normalise the input layer.
This speeds up training, and makes regularisation saner.
We can normalise other layers. We take each input, subtract the batch mean divided by the batch standard deviation.
Batch normalisation can make networks better adapted for related problems.