# Association rules

## Association rules

### Association rules

#### The data

We have a transaction dataset, \(D\).

This includes transactions of items in \(I\).

Any subset of \(I\) is an itemset.

A subset of size \(k\) is a \(k\)-itemset.

Transactions are a \(k\)-itemset with a unique id, tid.

The set of all transactions is \(T\).

A tidset is a subset of \(T\).

We have a total order on the items.

An itemset \(ab\) is greater than \(a\), for examplle.

The two points of the lattice are the nullset, and \(I\).

#### Mappings

We have a mapping from \(I\) to \(T\) called \(t\).

We have another mapping from \(T\) to \(I\) called \(i\).

#### Frequency

We define the frequency of an itemset as the number of transactions it appears in.

We can write the frequency of \(A\) as \(\sum A\).

### Strong rules

#### Strong rules

We use frequent patterns to generate strong rules, \(R\).

An example of a strong rule is \(a\rightarrow b\).

We can look at this by comparing the support of \(a\) to \(a\land b\).

\(supp(A\rightarrow B)=P(A\land B)\)

\(conf(A\rightarrow B)=P(B|A)\)

\(conf(A\rightarrow B)=\dfrac{P(A\land B)}{P(A)}\)

### Support

#### Support

We define the support of an itemset as the proportion of transactions which contain the itemset.

\(supp(A)=\dfrac{\sum A }{n}\).

We can also consider this as:

\(supp(A)=P(A)\)

### Frequent patterns

#### Frequent patterns

A frequent itemset is one where the support is above a minimum.

We know that if an itemset is frequent, then all its subsets are also frequent.

We look for frequent patterns, \(F\), between the items \(I\).

An example of a frequent pattern is \(\{a,b\}\).

### Confidence

#### Confidence

The confidence of a frequent pattern is defined as:

\(conf (A\rightarrow B)=\dfrac{supp(A\land B)}{supp(A)}\)

### Finding strong rules using the Apriori algorithm

#### Finding frequent patterns

We can use search algorithms to find frequent patterns. Starting at the empty set.

#### Apriori algorithm

Breadth first search to generate candidate set of itemsets with support above some value.

Start with a \(1\)-itemset, and increase \(k\) once done.

Once we have found a frequent pattern, we can immediately identify other frequent patterns associated with it.

We can do this by looking at confidence, not support.

### Interest

#### Interest

An alternative measure for finding rules is to use interest.

\(Interest(A\rightarrow B)=\dfrac{P(A\land B)}{P(A)P(B)}\)

\(Interest(A\rightarrow B)=\dfrac{supp(A\land B)}{P(supp(A))P(supp(B))}\)

If this is \(1\), then they are independent.

If this is greater than \(1\), they are positively dependent.

If this is less than \(1\), they are negatively dependent.

### Quantitative association rules

#### Quantitative association rules

The search space is infinite in size. For example continuous age.

We choose intervals instead.