# Association rules

## Association rules

### Association rules

#### The data

We have a transaction dataset, $$D$$.

This includes transactions of items in $$I$$.

Any subset of $$I$$ is an itemset.

A subset of size $$k$$ is a $$k$$-itemset.

Transactions are a $$k$$-itemset with a unique id, tid.

The set of all transactions is $$T$$.

A tidset is a subset of $$T$$.

#### Forming a lattice

We have a total order on the items.

An itemset $$ab$$ is greater than $$a$$, for examplle.

The two points of the lattice are the nullset, and $$I$$.

#### Mappings

We have a mapping from $$I$$ to $$T$$ called $$t$$.

We have another mapping from $$T$$ to $$I$$ called $$i$$.

#### Frequency

We define the frequency of an itemset as the number of transactions it appears in.

We can write the frequency of $$A$$ as $$\sum A$$.

### Strong rules

#### Strong rules

We use frequent patterns to generate strong rules, $$R$$.

An example of a strong rule is $$a\rightarrow b$$.

We can look at this by comparing the support of $$a$$ to $$a\land b$$.

$$supp(A\rightarrow B)=P(A\land B)$$

$$conf(A\rightarrow B)=P(B|A)$$

$$conf(A\rightarrow B)=\dfrac{P(A\land B)}{P(A)}$$

### Support

#### Support

We define the support of an itemset as the proportion of transactions which contain the itemset.

$$supp(A)=\dfrac{\sum A }{n}$$.

We can also consider this as:

$$supp(A)=P(A)$$

### Frequent patterns

#### Frequent patterns

A frequent itemset is one where the support is above a minimum.

We know that if an itemset is frequent, then all its subsets are also frequent.

We look for frequent patterns, $$F$$, between the items $$I$$.

An example of a frequent pattern is $$\{a,b\}$$.

### Confidence

#### Confidence

The confidence of a frequent pattern is defined as:

$$conf (A\rightarrow B)=\dfrac{supp(A\land B)}{supp(A)}$$

### Finding strong rules using the Apriori algorithm

#### Finding frequent patterns

We can use search algorithms to find frequent patterns. Starting at the empty set.

#### Apriori algorithm

Breadth first search to generate candidate set of itemsets with support above some value.

Start with a $$1$$-itemset, and increase $$k$$ once done.

Once we have found a frequent pattern, we can immediately identify other frequent patterns associated with it.

We can do this by looking at confidence, not support.

### Interest

#### Interest

An alternative measure for finding rules is to use interest.

$$Interest(A\rightarrow B)=\dfrac{P(A\land B)}{P(A)P(B)}$$

$$Interest(A\rightarrow B)=\dfrac{supp(A\land B)}{P(supp(A))P(supp(B))}$$

If this is $$1$$, then they are independent.

If this is greater than $$1$$, they are positively dependent.

If this is less than $$1$$, they are negatively dependent.

### Quantitative association rules

#### Quantitative association rules

The search space is infinite in size. For example continuous age.