Association Analysis

Association analysis is an unsupervised learning task that discovers relationships between items that frequently occur together in transactions. Unlike classification or regression, there's no target variable—the goal is to uncover hidden patterns and associations in transactional data.

Training Association Models

Looking to train association models? Check out our comprehensive Association Model Training Guide with detailed parameter documentation for all 5 available algorithms including Apriori, FP-Growth, Eclat, and more.

What Is Association Analysis

Association analysis identifies patterns where items tend to appear together more frequently than would be expected by chance. The most famous application is market basket analysis, which answers questions like "What products do customers buy together?"

Common use cases:

Product recommendation systems
Store layout optimization
Cross-selling and promotional bundling
Customer behavior analysis
Web page navigation patterns
Medical diagnosis (symptom combinations)

Classic Example: The "beer and diapers" discovery at a retail store found that customers who buy diapers often buy beer in the same transaction—an unexpected but actionable pattern for store layout and promotions.

Key Concepts

Transactions and Items

Transaction: A collection of items that occur together in a single event. Examples:

Shopping cart: All items purchased together
Web session: Pages visited in one session
Medical record: Symptoms or treatments for one patient

Item: A discrete entity that can appear in transactions. Items are typically:

Products (SKUs) in retail
Web pages or features in clickstream data
Symptoms or medications in healthcare
Words in text documents

Itemsets

Itemset: A collection of one or more items.

1-itemset: [Bread]
2-itemset: [Bread, Butter]
3-itemset: [Bread, Butter, Milk]

Frequent Itemset: An itemset that appears in at least min_support proportion of transactions.

Association Rules

Rule Format: $X \rightarrow Y$ (read as "if X then Y")

Antecedent (X): The "if" part—items on the left side
Consequent (Y): The "then" part—items on the right side
Example: [Bread, Butter] -> [Milk]
Meaning: Customers who buy bread and butter also tend to buy milk

Key Point: Rules are directional for measurement purposes, but don't imply causation. [Bread] -> [Butter] and [Butter] -> [Bread] are different rules with potentially different confidence values.

Understanding Association Metrics

Association rules are evaluated using several metrics that measure different aspects of the relationship.

Support

What it measures: How frequently an itemset appears in the data.

Formula: support(X) = (# transactions containing X) / (total # transactions)

Example:

1000 transactions total
[Bread, Milk] appears in 150 transactions
support([Bread, Milk]) = 150/1000 = 0.15 = 15%

Why it matters: Support filters out rare patterns that might be noise. Very low support patterns (< 1%) might be spurious or not actionable at scale.

Typical thresholds:

Large datasets (>10k transactions): 0.001-0.01 (0.1%-1%)
Medium datasets: 0.01-0.05 (1%-5%)
Small datasets: 0.05-0.1 (5%-10%)

Confidence

What it measures: The reliability of the rule—how often Y appears when X appears.

Formula: confidence(X \rightarrow Y) = support(X ∪ Y) / support(X)

Example:

support([Bread]) = 0.50 (50% of transactions)
support([Bread, Butter]) = 0.30 (30% of transactions)
confidence(Bread -> Butter) = 0.30 / 0.50 = 0.60 = 60%

Interpretation: 60% of customers who buy bread also buy butter.

Limitation: High confidence doesn't always mean strong association. If butter appears in 60% of all transactions anyway, this rule isn't particularly informative.

Lift

What it measures: How much more likely Y is to appear with X compared to Y's baseline frequency.

Formula: lift(X -> Y) = confidence(X -> Y) / support(Y)

Example:

confidence(Bread -> Butter) = 0.60
support(Butter) = 0.40 (40% of all transactions contain butter)
lift(Bread -> Butter) = 0.60 / 0.40 = 1.5

Interpretation:

lift = 1.0: X and Y are independent (no association)
lift > 1.0: Positive association (Y more likely with X)
- 1.5 = 50% increase in likelihood
- 2.0 = 100% increase (twice as likely)
lift < 1.0: Negative association (Y less likely with X)

Why lift is crucial: It accounts for item popularity. A rule with 90% confidence but lift of 1.0 means the consequent is just a popular item, not meaningfully associated with the antecedent.

Best for discovery: Lift is symmetric [lift(X -> Y) = lift(Y -> X)] and identifies true associations rather than popular items.

Other Metrics

Leverage: Measures how much more frequently X and Y occur together than expected if independent. Positive values indicate positive association.

Conviction: Measures dependency—how much more Y depends on X. Values > 1 indicate Y depends on X; infinity means perfect dependency.

Market Basket Analysis

Market basket analysis is the most common application of association analysis, focused on retail transactions.

The Goal

Understand which products are purchased together to:

Recommend complementary products
Design promotional bundles (e.g., "buy bread, get 20% off butter")
Optimize store layout (place associated items near each other)
Plan inventory (stock complementary items together)
Create targeted marketing campaigns

Types of Associations

Complementary items: Products used together

[Toothbrush] -> [Toothpaste]
[Burger buns] -> [Ground beef]
[Pasta] -> [Pasta sauce]

Substitute items: Products rarely bought together (negative association)

[Coke] and [Pepsi] (lift < 1.0)
[iPhone] and [Android phone]

Unexpected associations: Surprising patterns that require investigation

[Diapers] -> [Beer] (lifestyle factors)
[Batteries] -> [Toys] (seasonal Christmas shopping)

When to Use Association Analysis

Good fit:

Transaction data with multiple items per transaction
Want to discover patterns without predefined target
Need recommendations or cross-selling strategies
Interested in understanding co-occurrence patterns
Have at least hundreds of transactions

Poor fit:

Single-item transactions (no co-occurrence to analyze)
Time-series prediction (use time series methods instead)
Classification into predefined categories (use classification)
Very few transactions (< 100) or very few items
Need to predict specific outcomes (use supervised learning)

Choosing Between Algorithms

All association algorithms find the same frequent itemsets and rules—they differ in strategy and performance.

Algorithm Comparison

Apriori:

Strategy: Breadth-first, generate and test candidates
Best for: Learning, understanding the fundamentals
Speed: Slower on large datasets
Memory: Can be high with low support

FP-Growth:

Strategy: Tree-based, no candidate generation
Best for: Production use, most datasets
Speed: Fastest for most scenarios
Memory: Efficient

Eclat:

Strategy: Vertical format, set intersection
Best for: Sparse data (many items, few per transaction)
Speed: Fast for sparse datasets
Memory: Higher for dense data

Relim:

Strategy: Recursive elimination
Best for: Memory-constrained environments
Speed: Moderate
Memory: Most memory-efficient

FPMax:

Strategy: Finds only maximal itemsets
Best for: Compact output, overview of longest patterns
Speed: Faster than finding all itemsets
Memory: Lower output size

Quick Selection Guide

Start with FP-Growth for most use cases. Consider alternatives if:

Learning the concepts → Apriori
Sparse data (e.g., supermarket with 10k SKUs, baskets of 10 items) → Eclat
Memory constraints → Relim
Only want longest patterns → FPMax

Practical Considerations

Data Format

Association algorithms require transaction data in one of two formats:

Wide format:

Each column is an item
Each row is a transaction
Values are 1 (present) or 0 (absent)

Long format:

Each row is one item in a transaction
Transaction ID column groups items
More natural for real-world data

Most systems handle long format more naturally (database tables, CSV exports).

Parameter Tuning

Minimum Support:

Too low: Too many patterns, slow computation, noise
Too high: Miss interesting rare patterns
Start: 0.02 (2%), adjust based on results

Maximum Itemset Length:

2: Pairwise associations only (easiest to interpret)
3: Include three-way patterns (typical max)
4+: Harder to interpret, exponentially more patterns

Rule Metric:

Lift: Best for discovery (accounts for popularity)
Confidence: For reliability requirements
Leverage/Conviction: Alternative strength measures

Filtering and Validation

Initial filtering:

Set min_support to reduce candidates
Use min_lift > 1.5 for strong associations
Set min_confidence > 0.5 for reliable rules

Post-processing:

Remove trivial rules (obvious associations)
Sort by lift to find strongest associations
Focus on actionable patterns
Validate with domain experts

Common pitfalls:

High confidence + low lift = popular item, not true association
Very low support = might be noise
Contradicts domain knowledge = investigate or discard

Evaluation

Association analysis has no single accuracy metric. Evaluate based on:

Internal quality:

Are patterns frequent enough to be actionable?
Do rules have strong lift values (> 1.5)?
Are confidence levels adequate?

Business value:

Are patterns surprising and useful?
Can insights drive actions (recommendations, promotions)?
Do they align with domain knowledge?

Experimental validation:

A/B test recommendations
Measure conversion rates
Track bundle sales performance

Example Workflow

1. Prepare Transaction Data

Format: Transaction ID + Items
Clean: Remove returns, test orders
Filter: Focus on specific product categories or time periods

2. Choose Algorithm

Most cases: FP-Growth
Learning: Apriori
Sparse data: Eclat

3. Set Initial Parameters

min_support = 0.02 (2%)
max_length = 3
rule_metric = lift
min_threshold = 1.5

4. Mine Patterns

Run the algorithm and examine results:

How many itemsets found?
How many rules generated?
What's the distribution of lift values?

5. Adjust and Refine

Too few patterns:

Lower min_support to 0.01
Reduce min_lift threshold
Increase max_length

Too many patterns:

Increase min_support to 0.05
Increase min_lift to 2.0
Enable advanced filtering (confidence + lift)
Focus on specific categories

6. Validate and Apply

Review top rules by lift
Validate with domain experts
Filter for actionable insights
Implement recommendations or strategies
Measure business impact

Relationship to Other Tasks

vs. Clustering:

Clustering: Groups similar transactions or customers
Association: Finds item co-occurrence patterns within transactions
Can combine: Cluster customers, then find associations within each cluster

vs. Collaborative Filtering:

Collaborative Filtering: Recommends based on user-item ratings (e.g., "users like you also liked...")
Association: Recommends based on item co-occurrence (e.g., "people who bought X also bought Y")
Use together: Blend both approaches for recommendations

vs. Sequential Pattern Mining:

Sequential: Finds patterns in ordered sequences (e.g., page A → page B → page C)
Association: Finds co-occurrence regardless of order
Choose sequential when: Order matters (clickstreams, customer journeys)

Common Applications

Retail:

Product recommendations
Bundle promotions
Store layout optimization
Inventory management

E-commerce:

"Frequently bought together" suggestions
Personalized product recommendations
Cart completion prompts

Healthcare:

Disease-symptom associations
Drug interaction patterns
Treatment protocol analysis

Web Analytics:

Page navigation patterns
Feature usage combinations
User behavior clustering

Finance:

Fraud detection (unusual transaction patterns)
Service bundle recommendations
Cross-selling financial products

Association analysis is exploratory and insight-driven. The goal is to discover actionable patterns that weren't obvious beforehand. Success depends on domain knowledge, proper filtering, and validation through real-world testing.

Association Analysis

What Is Association Analysis

Key Concepts

Transactions and Items

Itemsets

Association Rules

Understanding Association Metrics

Support

Confidence

Lift

Other Metrics

Market Basket Analysis

The Goal

Types of Associations

When to Use Association Analysis

Choosing Between Algorithms

Algorithm Comparison

Quick Selection Guide

Practical Considerations

Data Format

Parameter Tuning

Filtering and Validation

Evaluation

Example Workflow

1. Prepare Transaction Data

2. Choose Algorithm

3. Set Initial Parameters

4. Mine Patterns

5. Adjust and Refine

6. Validate and Apply

Relationship to Other Tasks

Common Applications

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Association Analysis

What Is Association Analysis

Key Concepts

Transactions and Items

Itemsets

Association Rules

Understanding Association Metrics

Support

Confidence

Lift

Other Metrics

Market Basket Analysis

The Goal

Types of Associations

When to Use Association Analysis

Choosing Between Algorithms

Algorithm Comparison

Quick Selection Guide

Practical Considerations

Data Format

Parameter Tuning

Filtering and Validation

Evaluation

Example Workflow

1. Prepare Transaction Data

2. Choose Algorithm

3. Set Initial Parameters

4. Mine Patterns

5. Adjust and Refine

6. Validate and Apply

Relationship to Other Tasks

Common Applications

On this page

Command Palette