Parameters are set to \(0\) and not trained.
Parameters share the same value and are trained together.
After each update, multiply the parameter by \(p<1\).
Can change input to get any classification.
In a node we have:
\(a_{ij}=\sigma_{ij}(W_{ij}a_{i-1})\)
That is, the value of a node, is the activation on the sum of the weights of the previous layer.
Residual block however look further back that one layer. They include the full data from an older layer (without weights)
\(a_{ij}=\sigma_{ij}(W_{ij}a_{i-1}+a_k)\)