We have a sample space, \(\Omega \) consisting of elementary events.

All elementary events are disjoint sets.

We have a \(\sigma\)-algebra over \(\Omega \) called \(F\). A \(\sigma\)-algebra takes a set a provides another set containing subsets closed under complement. The power set is an example.

All events \(E\) are subsets of \(\Omega\)

\(\forall E\in F E\subseteq \Omega\)

Events are mutually exclusive if they are disjoint sets.

For each event \(E\), there is a complementary event \(E^C\) such that:

\(E\lor E^C=\Omega\)

\(E\land E^C=\varnothing\)

This exists by construction in the measure space.

As events are sets, we can define algebra on sets. For example for two events \(E_i\) and \(E_j\) we can define:

\(E_i\land E_j\)

\(E_i\lor E_j\)

For all events \(E\) in \(F\), the probability function \(P\) is defined.

This gives us the following measure space:

\((\Omega, F, P)\)

First axiom

The probability of all events is a non-negative real number.

\(\forall E \in F [(P(E)\ge 0)\land (P(E)\in \mathbb{R})]\)

The probability of one of the elementary events occuring is \(1\).

The probability of the outcome set is \(1\).

\(P(\Omega )=1\)

The probability of union for mutually exclusive events is:

\(P(\cup^\infty_{i=1}E_i)=\sum_{i=1}^\infty P(E_i)\)

\(P(\Omega )=1\)

\(P(\Omega \lor \varnothing )=1\)

\(P(\Omega )+P(\varnothing )=1\)

\(P(\varnothing )=0\)

Consider \(E_i\subseteq E_j\):

\(E_j=E_i\lor E_k\)

\(P(E_j)=P(E_i\lor E_k)\)

Disjoint so:

\(P(E_j)=P(E_i)+P(E_k)\)

We know that \(P(E_k)\ge 0\) from axiom \(1\) so:

\(P(E_j)\ge P(E_i)\)

As all events are subsets of the sample space:

\(P(\Omega )\ge P(E)\)

\(1\ge P(E)\)

From axiom \(1\) then know:

\(\forall E\in F [0\le P(E)\le 1]\)

\(P(E\land \varnothing )=P(\varnothing )=0\)

\(P(E\lor \Omega )=P(\Omega )=1\)

\(P(E\lor \varnothing)=P(E)\)

\(P(E\land \Omega )=P(E)\)

Firstly:

\(P(E_i)=P(E_i\land \Omega)\)

\(P(E_i)=P(E_i\land (E_j\lor E_j^C))\)

\(P(E_i)=P((E_i\land E_j)\lor (E_i\land E_j^C))\)

As the latter are disjoint:

\(P(E_i)=P((E_i\land E_j)+(E_i\land E_j^C))\)

We know that:

\(P(E_i\lor E_j)=P((E_i\lor E_j)\land (E_j\lor E_j^C))\)

By the distributive law of sets:

\(P(E_i\lor E_j)=P((E_i\land E_j^C)\lor E_j)\)

\(P(E_i\lor E_j)=P((E_i\land E_j^C)\lor (E_j\land (E_i\lor E_i^C))\)

By the distributive law of sets:

\(P(E_i\lor E_j)=P((E_i\land E_j^C)\lor (E_j\land E_i)\lor (E_j\land E_i^C))\)

As these are disjoint:

\(P(E_i\lor E_j)=P(E_i\land E_j^C)+ P(E_j\land E_i)+P(E_j\land E_i^C)\)

From the separation rule:

\(P(E_i\lor E_j)=P(E_i)-P(E_i\land E_j)+ P(E_j\land E_i)+P(E_j)-P(E_j\land E_i)\)

\(P(E_i\lor E_j)=P(E_i)+P(E_j)-P(E_i\land E_j)\)

From the addition rule:

\(P(E_i\lor E_j)=P(E_i)+P(E_j)-P(E_i\land E_j)\)

Consider \(E\) and \(E^C\):

\(P(E\lor E^C)=P(E)+P(E^C)-P(E\land E^C)\)

We know that \(E\) and \(E^C\) are disjoint, that is:

\(E\land E^C=\varnothing\)

Similarly by construction:

\(E\lor E^C=\Omega \)

So:

\(P(\Omega )=P(E)+P(E^C)-P(\varnothing)\)

\(1=P(E)+P(E^C)\)

Given a set of outcomes for a variable, the odds of the outcome are defined as:

\(o_f=\dfrac{P(E)}{P(E^C)}\)

For example, the odds of rolling a \(6\) are \(\dfrac{1}{5}\).

We know that:

\(\sum_yP(X\land Y)=P(X)\)

So for the continuous case

\(P(X)=\int_{-\infty }^{\infty }P(X\land Y)dy\)

This behaves like the probability for a single event, or multiple events with one fewer event if there were more than \(2\) events to start with.