Relim

Recursive Elimination algorithm that builds patterns by recursively eliminating items while maintaining frequency counts.

When to Use Relim

Memory-constrained environments
Need efficient recursive approach
Alternative to FP-Growth
Balance of speed and memory efficiency

Strengths

Memory-efficient compared to other algorithms
Recursive approach is elegant
Good balance of speed and memory
Handles moderate to large datasets well
No candidate generation

Weaknesses

Less popular than FP-Growth
Fewer optimizations available
Not as fast as FP-Growth on most datasets
Less documentation and community support

How it Works

Build initial item frequency lists
Recursively eliminate items while tracking patterns
Build frequent itemsets through elimination process
Maintain only necessary information at each recursion level

Key Advantage: The recursive elimination approach keeps memory usage lower than building explicit tree structures, while still avoiding candidate generation.

Recursive Process:

At each level, eliminate one item from consideration
Track which transactions remain relevant
Recursively process remaining items
Build itemsets bottom-up from elimination

When to Choose Relim

Best for:

Embedded systems or limited memory environments
Moderate-sized datasets (10k-1M transactions)
Need predictable memory usage
When FP-Growth uses too much memory

Choose FP-Growth instead when:

Performance is critical
Memory is not constrained
Need fastest possible mining
Large community support is important

Choose Apriori instead when:

Learning and understanding
Very small datasets
Interpretability is key

Parameters

All association algorithms share these common parameters:

Data Format

Input Format: 'long' or 'wide'

How your transaction data is structured:

Wide Format:

Each column represents one item
Each row is a transaction
Values are 1 (item present) or 0 (item absent)

Example:

TransactionID | Bread | Milk | Eggs | Butter
1             | 1     | 1    | 0    | 1
2             | 0     | 1    | 1    | 0

Long Format:

Each row is one item in a transaction
Requires Transaction ID column to group items
More natural for real-world data

Example:

TransactionID | Item
1             | Bread
1             | Milk
1             | Butter
2             | Milk
2             | Eggs

Feature Configuration

Feature Columns (required)

Wide format: List all item columns
Long format: Select the single column containing item names

Transaction ID Column (required for long format) Column that identifies which transaction each item belongs to.

Contains Multiple Items (long format only) Check if a single row can contain multiple items (e.g., "Bread, Milk, Eggs").

Item Separator (if multiset) Character separating multiple items (default: comma).

Example: "Bread, Milk, Eggs" uses "," as separator

Segmentation (Optional)

Segmentation Column Analyze different customer segments separately:

Store locations (downtown vs. suburban)
Customer types (premium vs. regular)
Time periods (weekday vs. weekend)

Target Segment Value Filter to analyze only specific segment.

Model Parameters

Minimum Support (default: 0.02, required) Threshold for how frequently an itemset must appear.

0.02 = 2% of transactions
Lower values: Find rare patterns, but slower and more results
Higher values: Only common patterns, faster
Recommendations:
- Large stores (>10k transactions): 0.001-0.01 (0.1%-1%)
- Medium stores: 0.01-0.05 (1%-5%)
- Small datasets: 0.05-0.1 (5%-10%)

Maximum Itemset Length (default: 3, required) Maximum number of items in a pattern.

2: Pairs only (A -> B)
3: Triples (A, B -> C)
4+: Complex patterns (slower, harder to interpret)
Recommendations:
- Start with 2-3 for interpretability
- Increase only if needed

Rule Evaluation Metric (default: "lift", required) How to measure rule strength:

lift: Strength of association (recommended)
confidence: Reliability of rule
leverage: Lift adjusted by item frequencies
conviction: Dependency strength

Metric Threshold (default: 1.2, required) Minimum value for the selected metric to keep a rule.

For lift: >1.0 (1.2 = 20% more likely)
For confidence: 0.5-0.9 (50%-90% probability)

Advanced Filtering (Optional)

Enable Advanced Filtering Set both confidence and lift thresholds simultaneously for stricter rules.

Minimum Confidence (default: 0.6) Probability that Y is purchased given X is purchased.

0.6 = 60% of transactions with X also have Y
Range: 0.1-1.0

Minimum Lift (default: 1.1) How much more likely Y is with X versus without X.

1.0 = No association (independent)
1.1 = 10% increase in likelihood
2.0 = 2x more likely
Range: >0.0 (typically >1.0 for meaningful rules)

Understanding Association Metrics

Support

Definition: How frequently an itemset appears in the database.

Formula: support(X) = (transactions containing X) / (total transactions)

Example:

100 transactions total
[Bread, Milk] appears in 20 transactions
support([Bread, Milk]) = 20/100 = 0.2 = 20%

Interpretation:

0.01 (1%): Rare pattern
0.05 (5%): Moderate frequency
0.2 (20%): Very common pattern

Use: Filter out rare, potentially spurious patterns

Confidence

Definition: Probability of finding Y in transactions that contain X.

Formula: confidence(X -> Y) = support(X U Y) / support(X)

Example:

support([Bread]) = 0.5 (50% of transactions)
support([Bread, Butter]) = 0.3 (30% of transactions)
confidence(Bread -> Butter) = 0.3 / 0.5 = 0.6 = 60%

Interpretation:

0.6 = 60% of customers who buy bread also buy butter
Higher confidence = more reliable rule

Limitation: Can be misleading if Y is very common

Lift

Definition: How much more likely Y is with X versus without X.

Formula: lift(X -> Y) = confidence(X -> Y) / support(Y)

Example:

confidence(Bread -> Butter) = 0.6
support(Butter) = 0.4 (40% buy butter overall)
lift(Bread -> Butter) = 0.6 / 0.4 = 1.5

Interpretation:

lift = 1.0: No association (X and Y are independent)
lift > 1.0: Positive association (Y more likely with X)
- 1.5 = 50% increase in likelihood
- 2.0 = 2x more likely (100% increase)
lift < 1.0: Negative association (Y less likely with X)

Why Lift is Best for Discovery:

Accounts for item popularity
Detects true associations vs. coincidence
Symmetric: lift(X -> Y) = lift(Y -> X)

Leverage

Definition: Difference between observed and expected co-occurrence.

Formula: leverage(X -> Y) = support(X U Y) - support(X) x support(Y)

Example:

support([Bread, Butter]) = 0.3 (observed)
support(Bread) x support(Butter) = 0.5 x 0.4 = 0.2 (expected if independent)
leverage = 0.3 - 0.2 = 0.1

Interpretation:

0: No association
Positive: Items appear together more than expected
Negative: Items appear together less than expected
Magnitude matters: Higher absolute value = stronger relationship

Conviction

Definition: Dependency measure - how much more Y depends on X.

Formula: conviction(X -> Y) = (1 - support(Y)) / (1 - confidence(X -> Y))

Example:

support(Butter) = 0.4
confidence(Bread -> Butter) = 0.6
conviction = (1 - 0.4) / (1 - 0.6) = 0.6 / 0.4 = 1.5

Interpretation:

1.0: No association (independent)
1.0: Y depends on X
infinity: Perfect dependency (always Y when X)

Use: Measures how much the rule deviates from independence

Configuration Tips

Best Practices for Relim

Memory-Constrained Scenarios:

Relim is your best choice when memory is limited
Use min_support >= 0.01 for best memory efficiency
Monitor memory usage during execution
Consider segmentation to process data in chunks

Optimal Settings:

min_support = 0.01-0.02 (good balance)
max_length = 3 (standard depth)
Enable advanced filtering to reduce output size

Performance Characteristics:

Typically 20-40% slower than FP-Growth
Uses 30-50% less memory than FP-Growth
More predictable memory usage than other algorithms
Good for consistent performance

When Relim is the Right Choice

Ideal Scenarios:

Cloud instances with memory limits
Embedded systems
Mobile or edge computing
Need predictable resource usage
Memory is more constrained than CPU

Example Use Cases:

IoT devices analyzing local transaction data
Mobile apps with on-device mining
Cost-optimized cloud deployments
Systems with hard memory limits

Common Issues and Solutions

Performance Slower than Expected

Symptom: Relim takes longer than anticipated

Explanation: Relim trades some speed for memory efficiency. This is expected behavior.

Solutions:

If speed is critical, switch to FP-Growth
Increase min_support to reduce search space
Reduce max_length
Ensure memory constraints actually require Relim

Memory Usage Higher than Expected

Symptom: Still hitting memory limits

Causes:

Very low min_support
Very large dataset
High max_length setting

Solutions:

Increase min_support to 0.02 or higher
Reduce max_length to 2-3
Use segmentation to process in chunks
Pre-filter to fewer items
Consider data sampling

Results Differ from Other Algorithms

Symptom: Different itemsets found

Note: All algorithms should find identical frequent itemsets above threshold. If they differ:

Verify parameters match exactly
Check data preprocessing is identical
Ensure min_support is the same
Order may differ, but content should match

Recursion Depth Issues

Symptom: Maximum recursion depth exceeded errors

Causes:

Extremely low min_support
Very high max_length
Unusual data characteristics

Solutions:

Increase min_support
Reduce max_length to 3 or less
Switch to FP-Growth if problem persists
Check for data quality issues

Relim

When to Use Relim

Strengths

Weaknesses

How it Works

When to Choose Relim

Parameters

Data Format

Feature Configuration

Segmentation (Optional)

Model Parameters

Advanced Filtering (Optional)

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Configuration Tips

Best Practices for Relim

When Relim is the Right Choice

Common Issues and Solutions

Performance Slower than Expected

Memory Usage Higher than Expected

Results Differ from Other Algorithms

Recursion Depth Issues

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Relim

When to Use Relim

Strengths

Weaknesses

How it Works

When to Choose Relim

Parameters

Data Format

Feature Configuration

Segmentation (Optional)

Model Parameters

Advanced Filtering (Optional)

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Configuration Tips

Best Practices for Relim

When Relim is the Right Choice

Common Issues and Solutions

Performance Slower than Expected

Memory Usage Higher than Expected

Results Differ from Other Algorithms

Recursion Depth Issues

On this page

Command Palette