**The Market-Basket Concept: **

Let's suppose we are working for a Market, and we would like to know which *sets of items* **often appear together** in *baskets*. To perform such analysis somehow we must observe all the baskets and infer such information. Within this scenario we suppose the number of baskets to be very large if compared to the number of items or to the average number of items in a basket.

**Frequent Itemsets:**

But, what does it mean *frequent? *Well, to be more formal we need to introduce the concept of * support threshold *indicated with

*s.*Considering a set of items

*I,*the

*support*for

*I*is the number of baskets for which

*I*is a subset. Then

*I*is considered

*frequent*if its support is greater or equal to

*s.*

Although the concept **Frequent ItemSets **at first was applied in the field of markets marketing investigations, there are also different fields where it could be applied:

*Related Concepts:*where words represent*items*and documents represent*baskets.**Biomarkers:*if we consider genes and blood proteins and deseases. Each basket is the set of data about a patient. A frequent itemset consisting of one or more*biomarker*and a deseas might suggest a test for that deseas.

**Association Rules:**

We can imagine an association rule as an *if-then *statement. It can be represented as $I \rightarrow j$ where *I *is a set of items and *j *is another item. The implication is that if all the items of *I *appear in a basket then *"most likely" *also *j* will be in the same basket as well.

To formalize such a concept we need to introduce the formal notion of likely defining the ** confidence of an association rule** $I \rightarrow j $:

The confidence of an association rule$I \rightarrow j $is defined as the ratio of the support for $ I \cup {j} $ to the support for I.

That is the confidence of the rule is the fraction of the baskets containing all of *I *that also contain *j*.

In the __next post__*"Frequent Itemsets Mining: The A-Priori Algorithm in Python explained"*.* *I am going to talk about the most famous algorithm for **Frequent Itemsets Mining**, the **A-Priori Algorithm.**

**Note:** there exist different variants of the A-Priori algorithms and different algorithms, worth to mention the most performant one: **FP-Growth.**