FPMax

Finds only maximal frequent itemsets - excludes any itemset that is a subset of another frequent itemset.

When to Use FPMax

Need compact representation
Only care about longest patterns
Want to reduce output size
Don't need all frequent itemsets

Strengths

Compact output (fewer itemsets)
Faster than finding all itemsets
Reduces redundancy significantly
Easier to review results
Lower memory for storing results

Weaknesses

Loses subset information
May miss useful smaller patterns
Cannot derive all rules from maximal itemsets alone
Less comprehensive than other algorithms

How it Works

FPMax extends the FP-Growth algorithm to find only maximal itemsets:

Use FP-tree structure like FP-Growth
During mining, check if itemset is maximal
Skip itemsets that are subsets of known frequent itemsets
Only output itemsets that are not contained in any larger frequent itemset

Key Principle: A maximal frequent itemset is one that is frequent, but none of its immediate supersets are frequent.

What are Maximal Itemsets?

Definition: An itemset is maximal if it is frequent AND no superset of it is frequent.

Example:

All Frequent Itemsets:

[A]: support 0.5
[B]: support 0.4
[C]: support 0.3
[A, B]: support 0.3
[A, C]: support 0.25
[B, C]: support 0.2
[A, B, C]: support 0.15

Maximal Itemsets Only:

[A, B, C]: support 0.15

The maximal itemset [A, B, C] implies that [A], [B], [C], [A, B], [A, C], and [B, C] are also frequent (by the Apriori property). FPMax only returns the longest pattern.

Trade-off: You lose the individual support values for subsets. You know they're frequent, but not how frequent.

When to Choose FPMax

Best for:

Exploratory analysis with many items
Need to identify "full patterns" only
Want manageable number of results
Don't need support for every subset
Storage or output size is a concern

Don't use FPMax when:

Need support values for subsets
Want to generate all possible rules
Need comprehensive pattern analysis
Smaller patterns are important for your use case

Parameters

All association algorithms share these common parameters:

Data Format

Input Format: 'long' or 'wide'

How your transaction data is structured:

Wide Format:

Each column represents one item
Each row is a transaction
Values are 1 (item present) or 0 (item absent)

Example:

TransactionID | Bread | Milk | Eggs | Butter
1             | 1     | 1    | 0    | 1
2             | 0     | 1    | 1    | 0

Long Format:

Each row is one item in a transaction
Requires Transaction ID column to group items
More natural for real-world data

Example:

TransactionID | Item
1             | Bread
1             | Milk
1             | Butter
2             | Milk
2             | Eggs

Feature Configuration

Feature Columns (required)

Wide format: List all item columns
Long format: Select the single column containing item names

Transaction ID Column (required for long format) Column that identifies which transaction each item belongs to.

Contains Multiple Items (long format only) Check if a single row can contain multiple items (e.g., "Bread, Milk, Eggs").

Item Separator (if multiset) Character separating multiple items (default: comma).

Example: "Bread, Milk, Eggs" uses "," as separator

Segmentation (Optional)

Segmentation Column Analyze different customer segments separately:

Store locations (downtown vs. suburban)
Customer types (premium vs. regular)
Time periods (weekday vs. weekend)

Target Segment Value Filter to analyze only specific segment.

Model Parameters

Minimum Support (default: 0.02, required) Threshold for how frequently an itemset must appear.

0.02 = 2% of transactions
Lower values: Find rare patterns, but slower and more results
Higher values: Only common patterns, faster
Recommendations:
- Large stores (>10k transactions): 0.001-0.01 (0.1%-1%)
- Medium stores: 0.01-0.05 (1%-5%)
- Small datasets: 0.05-0.1 (5%-10%)

Maximum Itemset Length (default: 3, required) Maximum number of items in a pattern.

2: Pairs only (A -> B)
3: Triples (A, B -> C)
4+: Complex patterns (slower, harder to interpret)
Recommendations:
- Start with 2-3 for interpretability
- Increase only if needed

Rule Evaluation Metric (default: "lift", required) How to measure rule strength:

lift: Strength of association (recommended)
confidence: Reliability of rule
leverage: Lift adjusted by item frequencies
conviction: Dependency strength

Metric Threshold (default: 1.2, required) Minimum value for the selected metric to keep a rule.

For lift: >1.0 (1.2 = 20% more likely)
For confidence: 0.5-0.9 (50%-90% probability)

Advanced Filtering (Optional)

Enable Advanced Filtering Set both confidence and lift thresholds simultaneously for stricter rules.

Minimum Confidence (default: 0.6) Probability that Y is purchased given X is purchased.

0.6 = 60% of transactions with X also have Y
Range: 0.1-1.0

Minimum Lift (default: 1.1) How much more likely Y is with X versus without X.

1.0 = No association (independent)
1.1 = 10% increase in likelihood
2.0 = 2x more likely
Range: >0.0 (typically >1.0 for meaningful rules)

Understanding Association Metrics

Support

Definition: How frequently an itemset appears in the database.

Formula: support(X) = (transactions containing X) / (total transactions)

Example:

100 transactions total
[Bread, Milk] appears in 20 transactions
support([Bread, Milk]) = 20/100 = 0.2 = 20%

Interpretation:

0.01 (1%): Rare pattern
0.05 (5%): Moderate frequency
0.2 (20%): Very common pattern

Use: Filter out rare, potentially spurious patterns

Confidence

Definition: Probability of finding Y in transactions that contain X.

Formula: confidence(X -> Y) = support(X U Y) / support(X)

Example:

support([Bread]) = 0.5 (50% of transactions)
support([Bread, Butter]) = 0.3 (30% of transactions)
confidence(Bread -> Butter) = 0.3 / 0.5 = 0.6 = 60%

Interpretation:

0.6 = 60% of customers who buy bread also buy butter
Higher confidence = more reliable rule

Limitation: Can be misleading if Y is very common

Lift

Definition: How much more likely Y is with X versus without X.

Formula: lift(X -> Y) = confidence(X -> Y) / support(Y)

Example:

confidence(Bread -> Butter) = 0.6
support(Butter) = 0.4 (40% buy butter overall)
lift(Bread -> Butter) = 0.6 / 0.4 = 1.5

Interpretation:

lift = 1.0: No association (X and Y are independent)
lift > 1.0: Positive association (Y more likely with X)
- 1.5 = 50% increase in likelihood
- 2.0 = 2x more likely (100% increase)
lift < 1.0: Negative association (Y less likely with X)

Why Lift is Best for Discovery:

Accounts for item popularity
Detects true associations vs. coincidence
Symmetric: lift(X -> Y) = lift(Y -> X)

Leverage

Definition: Difference between observed and expected co-occurrence.

Formula: leverage(X -> Y) = support(X U Y) - support(X) x support(Y)

Example:

support([Bread, Butter]) = 0.3 (observed)
support(Bread) x support(Butter) = 0.5 x 0.4 = 0.2 (expected if independent)
leverage = 0.3 - 0.2 = 0.1

Interpretation:

0: No association
Positive: Items appear together more than expected
Negative: Items appear together less than expected
Magnitude matters: Higher absolute value = stronger relationship

Conviction

Definition: Dependency measure - how much more Y depends on X.

Formula: conviction(X -> Y) = (1 - support(Y)) / (1 - confidence(X -> Y))

Example:

support(Butter) = 0.4
confidence(Bread -> Butter) = 0.6
conviction = (1 - 0.4) / (1 - 0.6) = 0.6 / 0.4 = 1.5

Interpretation:

1.0: No association (independent)
1.0: Y depends on X
infinity: Perfect dependency (always Y when X)

Use: Measures how much the rule deviates from independence

Configuration Tips

Best Practices for FPMax

When to Use:

Exploratory data analysis with many items
Need high-level patterns only
Output size is a concern
Want to quickly identify major patterns

Recommended Settings:

min_support = 0.02-0.05 (find substantial patterns)
max_length = 3-4 (allow longer maximal patterns)
Focus on interpretation of longest patterns

Understanding Output:

Expect significantly fewer itemsets
Each maximal itemset represents a "family" of subsets
All subsets of maximal itemsets are also frequent
You lose individual support values for subsets

Comparing Output Size

Example with 1000 items, min_support = 0.01:

All Frequent Itemsets (FP-Growth):

1-itemsets: 500
2-itemsets: 5,000
3-itemsets: 10,000
Total: 15,500 itemsets

Maximal Itemsets (FPMax):

Total: ~200 maximal itemsets

Reduction: 98.7% fewer itemsets to review

Common Issues and Solutions

Missing Useful Smaller Patterns

Symptom: Important 2-itemsets not visible in results

Explanation: FPMax only shows maximal itemsets. Smaller patterns are implied but not reported.

Solutions:

If you need smaller patterns, use FP-Growth instead
Manually derive subsets from maximal itemsets
Use FPMax for overview, then FP-Growth for details
Consider whether you truly need all subsets

Fewer Rules than Expected

Symptom: Rule generation produces very few rules

Explanation: Rules are only generated from maximal itemsets, losing many potential rules.

Solutions:

FPMax is not ideal for comprehensive rule mining
Use FP-Growth or Apriori for full rule generation
FPMax is best for pattern discovery, not rule mining
Consider FPMax for exploration, then switch algorithms

Cannot Reconstruct Subset Support

Symptom: Need support values for subsets of maximal itemsets

Explanation: FPMax doesn't report subset supports

Solutions:

Use FP-Growth instead if you need all support values
FPMax provides guarantee subsets are frequent, but not how frequent
Cannot reliably estimate subset support from maximal itemset
This is a fundamental limitation of maximal mining

Too Few Maximal Itemsets Found

Symptom: FPMax returns very few or no itemsets

Solutions:

Lower min_support (common cause)
Increase max_length (maximal itemsets tend to be longer)
Verify data format and preprocessing
Check if dataset truly has longer frequent patterns

Unexpected Maximal Itemsets

Symptom: Maximal itemsets seem wrong or surprising

Verification Steps:

Run FP-Growth to see all frequent itemsets
Manually verify maximal itemsets have frequent subsets
Check that no superset of maximal itemset is frequent
Validate against domain knowledge

Common Cause: Low max_length setting artificially limits maximality. Increase max_length to see truly maximal patterns.

FPMax

When to Use FPMax

Strengths

Weaknesses

How it Works

What are Maximal Itemsets?

When to Choose FPMax

Parameters

Data Format

Feature Configuration

Segmentation (Optional)

Model Parameters

Advanced Filtering (Optional)

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Configuration Tips

Best Practices for FPMax

Comparing Output Size

Common Issues and Solutions

Missing Useful Smaller Patterns

Fewer Rules than Expected

Cannot Reconstruct Subset Support

Too Few Maximal Itemsets Found

Unexpected Maximal Itemsets

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

FPMax

When to Use FPMax

Strengths

Weaknesses

How it Works

What are Maximal Itemsets?

When to Choose FPMax

Parameters

Data Format

Feature Configuration

Segmentation (Optional)

Model Parameters

Advanced Filtering (Optional)

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Configuration Tips

Best Practices for FPMax

Comparing Output Size

Common Issues and Solutions

Missing Useful Smaller Patterns

Fewer Rules than Expected

Cannot Reconstruct Subset Support

Too Few Maximal Itemsets Found

Unexpected Maximal Itemsets

On this page

Command Palette