Self information measures surprise of outcome. also called a surprisal.
When we observe an outcome we get information. We can develop a measure for how much information is associated with a specific measurement.
Rule 1: Information is always positive
Rule 2: If \(P(x)=1\), the the information for \(I(P(x))=0\).
Rule 3: If two events are independent, then their information is additive.
\(P(C)=P(A)P(B)\)
\(I(P(C))=I(P(A)P(B))\)
\(I(P(A))+I(P(B))=I(P(A)P(B))\)
A function which satisifes this is \(I(P(A))=-\log(P(A))\)
Any base can be used. 2 is most common, information is in units of bit then.
Entropy measures the expected amount of information produced by a source.
\(H(P(x))=E(I(P(x))\)
Entropy is similar to variance, is the sense that both measure uncertainty.
Entropy, however, has no references to specific values of \(x\). If all values were multiplied by 100, or if parts of the distribution were cut up and swapped, entropy would be unaffected.
For a probability function \(p(z)\), its entropy is :
\(H(p)=-\int p(z)\ln p(z)dz\).
This is a measure of the spread of a distribution.
Negative infinity means no uncertainty
For a multivariate gaussian \(H=d/2 ln(2\pi e|\Sigma)\).