The is the size of the sample.
This is the smallest value in the sample.
This is the largest value in the sample.
This is the difference between the maximum and minimum.
This is the value whereby 50% of the sample can be found below the value.
The \(x\)th percentile is the value by which \(x\%\) of the values can be found below it.
This is the differnence between the \(25\)th percentile and the \(75\)th percentile.
The is the most common value in the sample.
We previously defined the population mean is defined as \(\mu=E[X]\).
The sample mean is defined as \(\bar x = \dfrac{1}{n}\sum_i x_i\).
We can subtract the mean from each entry in the sample. This will leave a new mean of \(0\). This is convenient for many calculations.
We previously defined the population variance as \(\sigma^2=E[(X-\mu)^2]\).
We define the sample variance as \(\sigma^2=\dfrac{1}{n}\sum_i(x_i-\bar x)^2\).
We can calculate this using matrices:
\(M=X-\bar x\)
\(\sigma^2=\dfrac{1}{n}M^TM\).
If \(\bar x =0\) then:
\(\sigma^2=\dfrac{1}{n}X^TX\).
\(\bar x_{n+1} = \dfrac{n\bar x_n+x_{n+1}}{n+1}\)
If it is centred:
\(\sigma_n^2=\dfrac{1}{n}X_n^TX_n\)
So:
\(\sigma_{n+1}^2=\dfrac{n\sigma_n^2 +x_{n+1}^tx_{n+1}}{n+1}\)