Can connect each node in first hidden layer to a subset of the input layer, eg one node for each 5x5 pixels
We also share weights for each of the first layer. Much fewer parameters, and can learn all good stuff
This also uses windows. Instead of max we multiply the window by a matrix elementwise and sum the values
Each matrix can represent some feature, like a curve.
We can use multiply convolution matrices to create multiple output matrices.
Matrices are called kernels. they are trained. start off random
We split the data up everytime we use convolional layers
Flattening layers bring them all back together
Parameters are that for pooling layers (height, width, stride, padding, but also set of convolutions.
We use different window sizes in parallel.
The input is a matrix. We place a number of windows on the input matrix. The max of each window is an input to the next layer.
Means fewer parameters, easier to compute, less chance of overfitting
Parameters: height, width of window, stride (amount shifts by each window)
We can also add padding to the edge of the image so we don’t lose data.
Same padding (use 0), valid padding (no padding)
Pooling layer compresses, takes 2x2. Max pooling returns highest activiation
Outputs of convolutions are scalars. however we can also create vectors, if we associate some convolutions with each other
eg if we have 6 convolutions, the output of these can be used to create a 6 dimensional vector for each window.
We can normalise the length of these vectors to between \(0\) and \(1\).
The output of this repesents the chance of finding the feature they are looking for, and the orientation
If the vector length is low, feature not found. if high, feature found.
We have orientation from vector, and position from window
We now have a layer of position and orientation of basic shapes (triangles, rectangles etc)
We want to know which more complex thing they are part of.
So the output of this step is again a matrix with position and orientation, but of more complex features
To determine the activation from each basic shape to the next feature we use routing-by-agreement.
This takes each basic shape and works out what it would look like if the complex feature was present.
If a complex feature has two basic shapes, they will both have the same predicted complex shape. Otherwise the relationship is spurious and they will not
If they agree we have a high weight
This process is complex and computationally expensive.
However we don’t need pooling layers now
Does normal conv first, then primary, then secondary.
We have vector space of feature position and orietnation. we can recreate output