I’ve been doing some work for my exams in a few days and I’m going through some past papers but unfortunately there are no corresponding answers. I’ve answered the question and I was wondering if someone could tell me if I am correct.
My question is
(c) A transactional dataset, T, is given below:
t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer,
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes
Assume that minimum support is 0.5 (minsup = 0.5).
(i) Find all frequent itemsets.
Here is how I worked it out:
Item : Amount
Milk : 4
Chicken : 4
Beer : 5
Cheese : 4
Boots : 1
Clothes : 3
Now because the minsup is 0.5 you eliminate boots and clothes and make a combo of the remaining giving:
{items} : Amount
{Milk, Chicken} : 2
{Milk, Beer} : 4
{Milk, Cheese} : 1
{Chicken, Beer} : 3
{Chicken, Cheese} : 3
{Beer, Cheese} : 2
Which leaves milk and beer as the only frequent item set then as it is the only one above the minsup?
There are two ways to solve the problem:
Assuming that you are using Apriori, the answer you got is correct.
The algorithm is simple:
First you count frequent 1-item sets and exclude the item-sets below minimum support.
Then count frequent 2-item sets by combining frequent items from previous iteration and exclude the item-sets below support threshold.
The algorithm can go on until no item-sets are greater than threshold.
In the problem given to you, you only get 1 set of 2 items greater than threshold so you can’t move further.
There is a solved example of further steps on Wikipedia here.
You can refer “Data Mining Concepts and Techniques” by Han and Kamber for more examples.