# Entropy

## Entropy

### Information

#### Criteria

Self information measures surprise of outcome. also called a surprisal.

When we observe an outcome we get information. We can develop a measure for how much information is associated with a specific measurement.

Rule 1: Information is always positive

Rule 2: If $$P(x)=1$$, the the information for $$I(P(x))=0$$.

Rule 3: If two events are independent, then their information is additive.

• $$P(C)=P(A)P(B)$$

• $$I(P(C))=I(P(A)P(B))$$

• $$I(P(A))+I(P(B))=I(P(A)P(B))$$

#### Choice of function

A function which satisifes this is $$I(P(A))=-\log(P(A))$$

Any base can be used. 2 is most common, information is in units of bit then.

### Entropy

#### Introduction

Entropy measures the expected amount of information produced by a source.

$$H(P(x))=E(I(P(x))$$

Entropy is similar to variance, is the sense that both measure uncertainty.

Entropy, however, has no references to specific values of $$x$$. If all values were multiplied by 100, or if parts of the distribution were cut up and swapped, entropy would be unaffected.

For a probability function $$p(z)$$, its entropy is :

$$H(p)=-\int p(z)\ln p(z)dz$$.

This is a measure of the spread of a distribution.

Negative infinity means no uncertainty

For a multivariate gaussian $$H=d/2 ln(2\pi e|\Sigma)$$.