We previously defined the population covariance as \(\sigma_{XY}=E[(X-\mu_X)^T(Y-\mu_Y)]\).
We define the sample covariance as \(\sigma_{XY}=\dfrac{1}{n}\sum_i(x_i-\bar x)(y_i-\bar y)\).
We can calculate this using matrices:
\(M=X-\bar x\)
\(N=Y-\bar y\)
\(\sigma_{XY}=\dfrac{1}{n}M^TN\).
\(\rho_{XY}=\dfrac{\sigma_{XY}}{\sigma_X \sigma_Y}\)
If we have \(n\) variables we can have a \(n\times n\) matrix \(\Sigma\) where:
\(\Sigma_{ij} = \sigma_{ij}=\dfrac{1}{n}(X_i-\bar x_i)^T(X_j-\bar x_j)\)
If \(\bar x = \bar y = 0\) then:
\(\sigma_{XY}=\dfrac{1}{n}X^TY\)
Here each entry is the correlation rather than the covariance.
The Pearson correlation coefficient is definited as the covariance normalised by the individual variances.
It is between \(-1\) (total negative linear correlation), \(0\) (no linear correlation) and \(1\) (total negative linear correlation).
\(p_{X,Y}=\dfrac{cov (X,Y)}{\sigma_X\sigma_Y}\)
For each of \(2\) variables we create a ranking of them.
From \(X\) and \(Y\) we then have \(R_X\) and \(R_Y\).
We then calculate the Pearson correlation coefficient between the rankings.
\(r_S=\dfrac{cov(R_X, R_Y)}{\sigma_{R_X}\sigma_{R_Y}}\)
Concordant and discordant pairs
\(\tau = \dfrac{n_{concordant}-n_{discordant}}{\begin{pmatrix}n\\2\end{pmatrix}}\)
If it is centred:
\(\sigma_{XY}^n=\dfrac{1}{n}X_n^TY_n\)
So:
\(\sigma_{XY}^{n+1}=\dfrac{n\sigma^n_{XY}+x_{n+1}^ty_{n+1}}{n+1}\)