We have a sample space, \(\Omega \). A random variable \(X\) is a mapping from the sample space to the real numbers:

\(X: \Omega \rightarrow \mathbb{R}\)

We can then define the set of elements in \(\Omega \). As an example, take a coin toss and a die roll. The sample space is:

\(\{H1,H2,H3,H4,H5,H6,T1,T2,T3,T4,T5,T6\}\)

A random variable could give us just the die value, such that:

\(X(H1)=X(T1)=1\)

We can define this more precisely using set-builder notation, by saying the following is defined for all \(c\in \mathbb{R}\):

\(\{\omega |X(\omega )\le c\}\)

That is, for any number random variable map \(X\), there is a corresponding subset of \(\Omega \) containing the \(\omega \)s in \(\Omega \) which map to less than \(c\).

Multiple variables can be defined on the sample space. If we rolled a die we could define variables for

Whether it was odd/even

Number on the die

Whether it was less than 3

With more die we could add even more variables

If we define a variable \(X\), we can also define another variable \(Y=X^2\).

\(P(X=x)=P({\omega |X(\omega)=x})\)

For discrete probability, this is a helpful number. For example for rolling a die.

This is not helpful for continuous probability, where the chance of any specific outcome is \(0\).

Random variables all valued as real numbers, and so we can write:

\(P(X\le x)=P({\omega |X(\omega)\le x})\)

Or:

\(F_X(x)=\int_{-\infty}^x f_X(u)du\)

\(F_X(x)=\sum_{x_i\le x}P(X=x_i) \)

\(P(X\le x)+P(X\ge x)-P(X=x)=1\)

\(P(a< X\le b)=F_X(b)-F_X(a)\)

If continuous, probability at any point is \(0\). We instead look at probability density.

Derived from cumulative distribution function:

\(F_X(x)=\int_{-\infty}^x f_X(u)du\)

The density function is \(f_X(x)\).

For probability mass functions:

\(P(Y=y|X=x)=\dfrac{P(Y=y\land X=x)}{P(X=x)}\)

For probability density functions:

\(f_Y(y|X=x)=\dfrac{f_{X,Y}(x,y)}{f_X(x)}\)

\(P(X=x\land Y=y)\)

\(P(X=x)=\sum_{y}P(X=x\land Y=y)\)

\(P(X=x)=\sum_{y}P(X=x|Y=y)P(Y=y)\)

\(x\) is independent of \(y\) if:

\(\forall x_i \in x,\forall y_j \in y (P(x_i|y_j)=P(x_i)\)

If \(P(x_i|y_j)=P(x_i)\) then:

\(P(x_i\land y_j)=P(x_i).P(y_j)\)

This logic extends beyond just two events. If the events are independent then:

\(P(x_i\land y_j \land z_j)=P(x_i).P(y_j \land z_k)=P(x_i).P(y_j).P(z_k)\)

Note that because:

\(P(x_i|y_j)=\dfrac{P(x_i\land y_j)}{P(y_j)}\)

If two variables are independent

\(P(x_i|y_j)=\dfrac{P(x_i)P(y_j)}{P(y_j)}\)

\(P(x_i|y_j)=P(x_i)\)

\(P(A\land B|X)=P(A|X)P(B|X)\)

This is the same as:

\(P(A|B\land X)=P(A|X)\)