\(L_p\) norms can be used to measure the distance between two metrics.
If we have data points \(v\) and \(w\) the distance is:
\(||v-w||=(\sum_i |v_i-w_i|^p)^{\dfrac{1}{p}}\)
If \(p=2\) we have the Euclidian norm. If \(p=1\) we have the Manhatten norm.
Given two vectors we can calculate:
\(\dfrac{a.b}{||a|| ||b||}\)
If the two vectors are identical, this is \(1\). If they are orthogonal this is \(0\). If they are opposite, this is \(-1\).
This is a generalisation of the dot product function, where we want to find similarity between two vectors.
If we have data points \(v\) and \(w\) the distance is:
\(K(v, w)\)
We have a point. How far away is this from the mean.
For a single dimension: number of standard deviations.
What about multidimensional data?
Could do sd for all distances, but correlations between variables. If two variables are highly correlated, it’s not really twice as far.
We use this:
\(D_M(\mathbf x)=\sqrt {(\mathbf x-\mathbf \mu )^TS^{-1}(\mathbf x-\mathbf \mu )}\)
If we have matrices \(A\) and \(A\) the distance is:
\(||A-B||=\sqrt {\sum_i \sum_j |a_{ij}-b_{ij}|^2}\)
This is a Euclidian norm.
We may want to examine the similarity between two sequences.
We want to match a sample from one sequence to a sample from the other sequence.
Simply matching at the same time point is naive, as samples may move at different speeds, or have offsets.
Say we have a distance function and a sample. How can we identify the \(k\)-nearest neighbours?
We can find the distance for all points, sort this and take the top \(k\) observations.