Dokumentation (english)

Association Rules Recommendation

Recommendation based on frequent itemset mining and association rules (e.g., 'users who bought X also bought Y'). Great for product bundling.

Recommendation based on frequent itemset mining and association rules (e.g., 'users who bought X also bought Y'). Great for product bundling.

When to use:

  • E-commerce cross-selling and upselling
  • Market basket analysis
  • Product bundling strategies
  • Need highly explainable rules

Strengths: Extremely explainable, discovers strong co-occurrence patterns, good for bundling, simple to understand Weaknesses: Not personalized, requires frequent co-occurrences, can miss rare but valuable associations

How it Works

Association Rules mining discovers patterns like "users who bought X also bought Y" by analyzing transaction histories. The algorithm:

  1. Frequent Itemset Mining: Identifies sets of items that appear together frequently
  2. Rule Generation: Creates rules from frequent itemsets (X -> Y)
  3. Rule Filtering: Keeps only rules meeting minimum confidence and lift thresholds

Key Metrics:

  • Support: How often items appear together (P(X ∩ Y))
  • Confidence: When X is purchased, how often Y is purchased (P(Y|X))
  • Lift: How much more likely Y is purchased when X is purchased vs. baseline (P(Y|X) / P(Y))

Example Rule:

{bread, butter} -> {milk}
Support: 0.05 (5% of transactions)
Confidence: 0.65 (65% of bread+butter buyers also buy milk)
Lift: 1.3 (milk is 30% more likely with bread+butter)

Parameters

Feature Configuration

Feature Columns (required) List of columns to use: must include user_id (or transaction_id) and item_id.

User Column (default: "user_id", required) Name of the column containing transaction identifiers. Each unique value represents a transaction or user session.

  • Can be transaction_id, session_id, basket_id, or user_id
  • Groups items that were purchased/interacted together

Item Column (default: "item_id", required) Name of the column containing item identifiers. Items that appear together in transactions.

Model-Specific Parameters

Minimum Support (default: 0.01) Minimum frequency threshold for itemsets (0 to 1). Itemsets appearing less frequently are ignored.

  • 0.001-0.01: Very rare patterns (large catalogs)
  • 0.01-0.05: Moderate patterns (default range)
  • 0.05-0.1: Only common patterns
  • Too low: Too many rules, noise
  • Too high: Miss interesting patterns

Minimum Confidence (default: 0.3) Minimum confidence threshold for rules (0 to 1). Rules with lower confidence are filtered out.

  • 0.1-0.3: Exploratory, capture weak associations
  • 0.3-0.5: Balanced (default range)
  • 0.5-0.8: Strong associations only
  • 0.8+: Very strict, few rules
  • Higher = more reliable but fewer recommendations

Minimum Lift (default: 1.0) Minimum lift threshold for rules. Rules with lift < 1.0 indicate negative association.

  • 1.0: Neutral, no filtering by lift (default)
  • 1.2-1.5: Slight positive association
  • 1.5-2.0: Moderate positive association
  • 2.0+: Strong positive association
  • Higher = stronger patterns but fewer rules

Top-K Recommendations (default: 10) Number of items to recommend per user/transaction based on discovered rules.

  • 3-5: Focused bundling suggestions
  • 5-10: Standard cross-sell recommendations
  • 10-20: Broader exploration

Configuration Tips

Dataset Size Considerations

  • Small (<10k transactions): Support: 0.03-0.05, may not find many patterns
  • Medium (10k-100k): Support: 0.01-0.03, ideal range
  • Large (100k-1M): Support: 0.005-0.01, many patterns
  • Very Large (>1M): Support: 0.001-0.005, reduce for performance

Parameter Tuning Guidance

Balancing Support and Confidence:

  1. Too few rules:

    • Decrease min_support (find rarer patterns)
    • Decrease min_confidence (allow weaker associations)
    • Decrease min_lift
  2. Too many rules:

    • Increase min_support (only frequent patterns)
    • Increase min_confidence (stronger associations)
    • Increase min_lift (stronger relationships)
  3. Good starting point:

    • Support: 0.01 (1% of transactions)
    • Confidence: 0.3 (30% conditional probability)
    • Lift: 1.0 (any positive association)
    • Adjust based on number and quality of rules

Optimization Process:

  1. Start with defaults
  2. Check number of rules generated (aim for 100-1000)
  3. Review top rules by confidence and lift
  4. Adjust thresholds to balance quantity and quality
  5. Validate rules make business sense

When to Choose This Over Alternatives

  • vs. Item-Based KNN: Choose this for explicit co-purchase patterns and bundling
  • vs. Collaborative Filtering: Choose this for non-personalized, general patterns
  • vs. Content-Based: Choose this when you don't have item features
  • Best for: Cross-selling, bundling, "frequently bought together", market basket analysis

Common Issues and Solutions

Too Few Rules

Issue: Not finding enough association rules. Solution:

  • Lower min_support (try 0.005-0.01)
  • Lower min_confidence (try 0.2-0.3)
  • Ensure sufficient transaction data (10k+ transactions)
  • Check that transactions have multiple items

Too Many Rules

Issue: Overwhelming number of rules, many low-quality. Solution:

  • Increase min_support (try 0.02-0.05)
  • Increase min_confidence (try 0.4-0.6)
  • Increase min_lift (try 1.5-2.0)
  • Filter by number of antecedents (prefer simple rules)

Not Personalized

Issue: Same recommendations for everyone. Solution:

  • This is expected behavior for association rules
  • Combine with collaborative filtering for personalization
  • Use user's current basket to select relevant rules
  • Consider Item-Based KNN or Hybrid model for personalization

Recommendations Too Obvious

Issue: Rules only capture obvious patterns (e.g., batteries with electronics). Solution:

  • Increase min_lift to find surprising associations
  • Filter out trivial category-level patterns
  • Focus on cross-category recommendations
  • Look for rules with high lift but moderate support

Seasonal/Temporal Patterns Missed

Issue: Rules don't capture time-based patterns. Solution:

  • Generate separate rules for different time periods
  • Weight recent transactions more heavily
  • Use sliding time windows
  • Consider BERT4Rec for sequential patterns

Scalability Issues

Issue: Rule mining too slow on large datasets. Solution:

  • Increase min_support to reduce candidate itemsets
  • Sample transactions for initial exploration
  • Limit maximum itemset size
  • Use FP-Growth algorithm variant (more efficient)

Example Use Cases

E-commerce Cross-Selling

Scenario: Online retailer wants to suggest complementary products at checkout Configuration:

  • Min Support: 0.02 (2% of transactions)
  • Min Confidence: 0.4 (40% likelihood)
  • Min Lift: 1.5 (50% more likely than random)
  • Top-5 recommendations
  • Transaction ID: order_id Why: "Frequently bought together", highly explainable, drives upsells

Grocery Store Bundling

Scenario: Supermarket chain wants to create product bundles and optimize layout Configuration:

  • Min Support: 0.05 (5% of baskets)
  • Min Confidence: 0.35
  • Min Lift: 1.3
  • Top-10 recommendations
  • Transaction ID: basket_id Why: Market basket analysis, discover co-purchase patterns, inform merchandising

Streaming Service Content Bundles

Scenario: Video platform wants to suggest "watch next" based on viewing sessions Configuration:

  • Min Support: 0.01 (1% of sessions)
  • Min Confidence: 0.3
  • Min Lift: 1.2
  • Top-8 recommendations
  • Transaction ID: session_id (videos watched in same session) Why: Discover content that's often watched together, create playlists, binge-watching patterns

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items