FPMax
Finds only maximal frequent itemsets for compact representation
Finds only maximal frequent itemsets - excludes any itemset that is a subset of another frequent itemset.
When to Use FPMax
- Need compact representation
- Only care about longest patterns
- Want to reduce output size
- Don't need all frequent itemsets
Strengths
- Compact output (fewer itemsets)
- Faster than finding all itemsets
- Reduces redundancy significantly
- Easier to review results
- Lower memory for storing results
Weaknesses
- Loses subset information
- May miss useful smaller patterns
- Cannot derive all rules from maximal itemsets alone
- Less comprehensive than other algorithms
How it Works
FPMax extends the FP-Growth algorithm to find only maximal itemsets:
- Use FP-tree structure like FP-Growth
- During mining, check if itemset is maximal
- Skip itemsets that are subsets of known frequent itemsets
- Only output itemsets that are not contained in any larger frequent itemset
Key Principle: A maximal frequent itemset is one that is frequent, but none of its immediate supersets are frequent.
What are Maximal Itemsets?
Definition: An itemset is maximal if it is frequent AND no superset of it is frequent.
Example:
All Frequent Itemsets:
[A]: support 0.5
[B]: support 0.4
[C]: support 0.3
[A, B]: support 0.3
[A, C]: support 0.25
[B, C]: support 0.2
[A, B, C]: support 0.15Maximal Itemsets Only:
[A, B, C]: support 0.15The maximal itemset [A, B, C] implies that [A], [B], [C], [A, B], [A, C], and [B, C] are also frequent (by the Apriori property). FPMax only returns the longest pattern.
Trade-off: You lose the individual support values for subsets. You know they're frequent, but not how frequent.
When to Choose FPMax
Best for:
- Exploratory analysis with many items
- Need to identify "full patterns" only
- Want manageable number of results
- Don't need support for every subset
- Storage or output size is a concern
Don't use FPMax when:
- Need support values for subsets
- Want to generate all possible rules
- Need comprehensive pattern analysis
- Smaller patterns are important for your use case
Parameters
All association algorithms share these common parameters:
Data Format
Input Format: 'long' or 'wide'
How your transaction data is structured:
Wide Format:
- Each column represents one item
- Each row is a transaction
- Values are 1 (item present) or 0 (item absent)
- Example:
TransactionID | Bread | Milk | Eggs | Butter 1 | 1 | 1 | 0 | 1 2 | 0 | 1 | 1 | 0
Long Format:
- Each row is one item in a transaction
- Requires Transaction ID column to group items
- More natural for real-world data
- Example:
TransactionID | Item 1 | Bread 1 | Milk 1 | Butter 2 | Milk 2 | Eggs
Feature Configuration
Feature Columns (required)
- Wide format: List all item columns
- Long format: Select the single column containing item names
Transaction ID Column (required for long format) Column that identifies which transaction each item belongs to.
Contains Multiple Items (long format only) Check if a single row can contain multiple items (e.g., "Bread, Milk, Eggs").
Item Separator (if multiset) Character separating multiple items (default: comma).
- Example: "Bread, Milk, Eggs" uses "," as separator
Segmentation (Optional)
Segmentation Column Analyze different customer segments separately:
- Store locations (downtown vs. suburban)
- Customer types (premium vs. regular)
- Time periods (weekday vs. weekend)
Target Segment Value Filter to analyze only specific segment.
Model Parameters
Minimum Support (default: 0.02, required) Threshold for how frequently an itemset must appear.
- 0.02 = 2% of transactions
- Lower values: Find rare patterns, but slower and more results
- Higher values: Only common patterns, faster
- Recommendations:
- Large stores (>10k transactions): 0.001-0.01 (0.1%-1%)
- Medium stores: 0.01-0.05 (1%-5%)
- Small datasets: 0.05-0.1 (5%-10%)
Maximum Itemset Length (default: 3, required) Maximum number of items in a pattern.
- 2: Pairs only (A -> B)
- 3: Triples (A, B -> C)
- 4+: Complex patterns (slower, harder to interpret)
- Recommendations:
- Start with 2-3 for interpretability
- Increase only if needed
Rule Evaluation Metric (default: "lift", required) How to measure rule strength:
- lift: Strength of association (recommended)
- confidence: Reliability of rule
- leverage: Lift adjusted by item frequencies
- conviction: Dependency strength
Metric Threshold (default: 1.2, required) Minimum value for the selected metric to keep a rule.
- For lift: >1.0 (1.2 = 20% more likely)
- For confidence: 0.5-0.9 (50%-90% probability)
Advanced Filtering (Optional)
Enable Advanced Filtering Set both confidence and lift thresholds simultaneously for stricter rules.
Minimum Confidence (default: 0.6) Probability that Y is purchased given X is purchased.
- 0.6 = 60% of transactions with X also have Y
- Range: 0.1-1.0
Minimum Lift (default: 1.1) How much more likely Y is with X versus without X.
- 1.0 = No association (independent)
- 1.1 = 10% increase in likelihood
- 2.0 = 2x more likely
- Range: >0.0 (typically >1.0 for meaningful rules)
Understanding Association Metrics
Support
Definition: How frequently an itemset appears in the database.
Formula: support(X) = (transactions containing X) / (total transactions)
Example:
- 100 transactions total
- [Bread, Milk] appears in 20 transactions
- support([Bread, Milk]) = 20/100 = 0.2 = 20%
Interpretation:
- 0.01 (1%): Rare pattern
- 0.05 (5%): Moderate frequency
- 0.2 (20%): Very common pattern
Use: Filter out rare, potentially spurious patterns
Confidence
Definition: Probability of finding Y in transactions that contain X.
Formula: confidence(X -> Y) = support(X U Y) / support(X)
Example:
- support([Bread]) = 0.5 (50% of transactions)
- support([Bread, Butter]) = 0.3 (30% of transactions)
- confidence(Bread -> Butter) = 0.3 / 0.5 = 0.6 = 60%
Interpretation:
- 0.6 = 60% of customers who buy bread also buy butter
- Higher confidence = more reliable rule
Limitation: Can be misleading if Y is very common
Lift
Definition: How much more likely Y is with X versus without X.
Formula: lift(X -> Y) = confidence(X -> Y) / support(Y)
Example:
- confidence(Bread -> Butter) = 0.6
- support(Butter) = 0.4 (40% buy butter overall)
- lift(Bread -> Butter) = 0.6 / 0.4 = 1.5
Interpretation:
- lift = 1.0: No association (X and Y are independent)
- lift > 1.0: Positive association (Y more likely with X)
- 1.5 = 50% increase in likelihood
- 2.0 = 2x more likely (100% increase)
- lift < 1.0: Negative association (Y less likely with X)
Why Lift is Best for Discovery:
- Accounts for item popularity
- Detects true associations vs. coincidence
- Symmetric: lift(X -> Y) = lift(Y -> X)
Leverage
Definition: Difference between observed and expected co-occurrence.
Formula: leverage(X -> Y) = support(X U Y) - support(X) x support(Y)
Example:
- support([Bread, Butter]) = 0.3 (observed)
- support(Bread) x support(Butter) = 0.5 x 0.4 = 0.2 (expected if independent)
- leverage = 0.3 - 0.2 = 0.1
Interpretation:
- 0: No association
- Positive: Items appear together more than expected
- Negative: Items appear together less than expected
- Magnitude matters: Higher absolute value = stronger relationship
Conviction
Definition: Dependency measure - how much more Y depends on X.
Formula: conviction(X -> Y) = (1 - support(Y)) / (1 - confidence(X -> Y))
Example:
- support(Butter) = 0.4
- confidence(Bread -> Butter) = 0.6
- conviction = (1 - 0.4) / (1 - 0.6) = 0.6 / 0.4 = 1.5
Interpretation:
- 1.0: No association (independent)
-
1.0: Y depends on X
- infinity: Perfect dependency (always Y when X)
Use: Measures how much the rule deviates from independence
Configuration Tips
Best Practices for FPMax
When to Use:
- Exploratory data analysis with many items
- Need high-level patterns only
- Output size is a concern
- Want to quickly identify major patterns
Recommended Settings:
- min_support = 0.02-0.05 (find substantial patterns)
- max_length = 3-4 (allow longer maximal patterns)
- Focus on interpretation of longest patterns
Understanding Output:
- Expect significantly fewer itemsets
- Each maximal itemset represents a "family" of subsets
- All subsets of maximal itemsets are also frequent
- You lose individual support values for subsets
Comparing Output Size
Example with 1000 items, min_support = 0.01:
All Frequent Itemsets (FP-Growth):
- 1-itemsets: 500
- 2-itemsets: 5,000
- 3-itemsets: 10,000
- Total: 15,500 itemsets
Maximal Itemsets (FPMax):
- Total: ~200 maximal itemsets
Reduction: 98.7% fewer itemsets to review
Common Issues and Solutions
Missing Useful Smaller Patterns
Symptom: Important 2-itemsets not visible in results
Explanation: FPMax only shows maximal itemsets. Smaller patterns are implied but not reported.
Solutions:
- If you need smaller patterns, use FP-Growth instead
- Manually derive subsets from maximal itemsets
- Use FPMax for overview, then FP-Growth for details
- Consider whether you truly need all subsets
Fewer Rules than Expected
Symptom: Rule generation produces very few rules
Explanation: Rules are only generated from maximal itemsets, losing many potential rules.
Solutions:
- FPMax is not ideal for comprehensive rule mining
- Use FP-Growth or Apriori for full rule generation
- FPMax is best for pattern discovery, not rule mining
- Consider FPMax for exploration, then switch algorithms
Cannot Reconstruct Subset Support
Symptom: Need support values for subsets of maximal itemsets
Explanation: FPMax doesn't report subset supports
Solutions:
- Use FP-Growth instead if you need all support values
- FPMax provides guarantee subsets are frequent, but not how frequent
- Cannot reliably estimate subset support from maximal itemset
- This is a fundamental limitation of maximal mining
Too Few Maximal Itemsets Found
Symptom: FPMax returns very few or no itemsets
Solutions:
- Lower min_support (common cause)
- Increase max_length (maximal itemsets tend to be longer)
- Verify data format and preprocessing
- Check if dataset truly has longer frequent patterns
Unexpected Maximal Itemsets
Symptom: Maximal itemsets seem wrong or surprising
Verification Steps:
- Run FP-Growth to see all frequent itemsets
- Manually verify maximal itemsets have frequent subsets
- Check that no superset of maximal itemset is frequent
- Validate against domain knowledge
Common Cause: Low max_length setting artificially limits maximality. Increase max_length to see truly maximal patterns.