This is a method for both building and training.
We start with a bare bones network. We then add nodes one by one, training and then fixing their values.
This is an alternative to backprobagation for training a feedforward neural network.
We start with random parameters for each layer \(W_i\).
We have:
\(\hat y=W_2\sigma (W_1 x)\)
Etc.
We calculate:
\(W_2=\sigma(W_1x)^+Y\)
So \(W_1\) is random and not updated.
\(W_2\) is assigned to minimise loss, where \(W_2\) has no activation function.