We have a sample space, \(\Omega\) consisting of elementary events.
All elementary events are disjoint sets.
We have a \(\sigma\)-algebra over \(\Omega\) called \(F\). A \(\sigma\)-algebra takes a set a provides another set containing subsets closed under complement. The power set is an example.
All events \(E\) are subsets of \(\Omega\)
\(\forall E\in F E\subseteq \Omega\)
Events are mutually exclusive if they are disjoint sets.
For each event \(E\), there is a complementary event \(E^C\) such that:
\(E\lor E^C=\Omega\)
\(E\land E^C=\varnothing\)
This exists by construction in the measure space.
As events are sets, we can define algebra on sets. For example for two events \(E_i\) and \(E_j\) we can define:
\(E_i\land E_j\)
\(E_i\lor E_j\)
For all events \(E\) in \(F\), the probability function \(P\) is defined.
This gives us the following measure space:
\((\Omega, F, P)\)
First axiom
The probability of all events is a non-negative real number.
\(\forall E \in F [(P(E)\ge 0)\land (P(E)\in \mathbb{R})]\)
The probability of one of the elementary events occuring is \(1\).
The probability of the outcome set is \(1\).
\(P(\Omega )=1\)
The probability of union for mutually exclusive events is:
\(P(\cup^\infty_{i=1}E_i)=\sum_{i=1}^\infty P(E_i)\)
\(P(\Omega )=1\)
\(P(\Omega \lor \varnothing )=1\)
\(P(\Omega )+P(\varnothing )=1\)
\(P(\varnothing )=0\)
Consider \(E_i\subseteq E_j\):
\(E_j=E_i\lor E_k\)
\(P(E_j)=P(E_i\lor E_k)\)
Disjoint so:
\(P(E_j)=P(E_i)+P(E_k)\)
We know that \(P(E_k)\ge 0\) from axiom \(1\) so:
\(P(E_j)\ge P(E_i)\)
As all events are subsets of the sample space:
\(P(\Omega )\ge P(E)\)
\(1\ge P(E)\)
From axiom \(1\) then know:
\(\forall E\in F [0\le P(E)\le 1]\)
\(P(E\land \varnothing )=P(\varnothing )=0\)
\(P(E\lor \Omega )=P(\Omega )=1\)
\(P(E\lor \varnothing)=P(E)\)
\(P(E\land \Omega )=P(E)\)
Firstly:
\(P(E_i)=P(E_i\land \Omega)\)
\(P(E_i)=P(E_i\land (E_j\lor E_j^C))\)
\(P(E_i)=P((E_i\land E_j)\lor (E_i\land E_j^C))\)
As the latter are disjoint:
\(P(E_i)=P((E_i\land E_j)+(E_i\land E_j^C))\)
We know that:
\(P(E_i\lor E_j)=P((E_i\lor E_j)\land (E_j\lor E_j^C))\)
By the distributive law of sets:
\(P(E_i\lor E_j)=P((E_i\land E_j^C)\lor E_j)\)
\(P(E_i\lor E_j)=P((E_i\land E_j^C)\lor (E_j\land (E_i\lor E_i^C))\)
By the distributive law of sets:
\(P(E_i\lor E_j)=P((E_i\land E_j^C)\lor (E_j\land E_i)\lor (E_j\land E_i^C))\)
As these are disjoint:
\(P(E_i\lor E_j)=P(E_i\land E_j^C)+ P(E_j\land E_i)+P(E_j\land E_i^C)\)
From the separation rule:
\(P(E_i\lor E_j)=P(E_i)-P(E_i\land E_j)+ P(E_j\land E_i)+P(E_j)-P(E_j\land E_i)\)
\(P(E_i\lor E_j)=P(E_i)+P(E_j)-P(E_i\land E_j)\)
From the addition rule:
\(P(E_i\lor E_j)=P(E_i)+P(E_j)-P(E_i\land E_j)\)
Consider \(E\) and \(E^C\):
\(P(E\lor E^C)=P(E)+P(E^C)-P(E\land E^C)\)
We know that \(E\) and \(E^C\) are disjoint, that is:
\(E\land E^C=\varnothing\)
Similarly by construction:
\(E\lor E^C=\Omega\)
So:
\(P(\Omega )=P(E)+P(E^C)-P(\varnothing)\)
\(1=P(E)+P(E^C)\)
Given a set of outcomes for a variable, the odds of the outcome are defined as:
\(o_f=\dfrac{P(E)}{P(E^C)}\)
For example, the odds of rolling a \(6\) are \(\dfrac{1}{5}\).
We know that:
\(\sum_yP(X\land Y)=P(X)\)
So for the continuous case
\(P(X)=\int_{-\infty }^{\infty }P(X\land Y)dy\)
This behaves like the probability for a single event, or multiple events with one fewer event if there were more than \(2\) events to start with.