We want to estimate parameters. One way of looking into this is to look at the likelihood function:
\(L(\theta ; X)=P(X|\theta )\)
The likelihood function shows the chance of the observed data being generated, given specific parameters.
If this has high peaks then it provides information that \(\theta\) is located in this region.
For multiple events, the likelihood function is:
\(L(\theta ; X)=P(X|\theta )\)
\(L(\theta ; X)=P(A_1 \land B_2 \land C_3 \land D_4…|\theta )\)
If the events are independent, that is the chance of a flip doesn’t depend on any other outcomes, then:
\(L(\theta ; X)=P(A_1|\theta ).P(B_2|\theta ).P(C_3|\theta ).P(D_4|\theta )...\)
If the events are identically distributed, the chance of flipping a head doesn’t change across flips (for example the heads side doesn’t get heavier over time) then:
\(L(\theta ; X)=P(A|\theta ).P(B|\theta ).P(C|\theta ).P(D|\theta )...\)
\(L(\theta ; X)=\prod_{i=1}^n P(X_i|\theta )\)