Dokumentation (english)

Hybrid Recommendation (CF + Content-Based)

Hybrid recommendation combining collaborative filtering (Item-KNN) and content-based (TF-IDF) with weighted averaging. Best of both approaches.

Hybrid recommendation combining collaborative filtering (Item-KNN) and content-based (TF-IDF) with weighted averaging. Best of both approaches.

When to use:

  • Have both interaction data AND item descriptions
  • Want balanced recommendations (discovery + relevance)
  • Need to handle cold start gracefully
  • Want best overall performance

Strengths: Handles cold start, combines discovery and relevance, more robust, better coverage Weaknesses: More complex, requires both data types, harder to tune, slower than single methods

How it Works

The Hybrid model combines two complementary approaches:

Collaborative Filtering (Item-Based KNN): Learns from user behavior patterns

  • "Users who liked X also liked Y"
  • Captures collective wisdom and trends
  • Good for discovery and popularity signals

Content-Based (TF-IDF): Learns from item features

  • "Items with similar descriptions"
  • Handles new items without interaction history
  • Captures intrinsic item properties

The final recommendation score is a weighted combination:

score = (alpha x collaborative_score) + ((1-alpha) x content_score)

This allows you to balance between behavior-based patterns (CF) and content similarity (CB).

Parameters

Feature Configuration

Feature Columns (required) List of columns to use: must include user_id, item_id, and content.

User Column (default: "user_id", required) Name of the column containing user identifiers. Each unique value represents a different user.

Item Column (default: "item_id", required) Name of the column containing item identifiers. Each unique value represents a different item to recommend.

Content Column (default: "description", required) Name of the column containing item descriptions or features. Used for content-based component.

  • Product descriptions, article text, movie plots, etc.
  • Higher quality content = better recommendations
  • Can concatenate multiple fields

Rating Column (optional) Name of the column containing ratings. If provided, weights collaborative filtering component. If not provided, uses implicit feedback.

Model-Specific Parameters

CF Weight (Alpha) (default: 0.5) Weight for collaborative filtering component (0 to 1). Controls the balance between CF and content-based.

  • 0.0: Pure content-based (only item features)
  • 0.3: Content-heavy (70% content, 30% CF)
  • 0.5: Balanced (50/50 mix) - default
  • 0.7: CF-heavy (70% CF, 30% content)
  • 1.0: Pure collaborative filtering (only interactions)

Top-K Recommendations (default: 10) Number of items to recommend for each user.

  • 5-10: Focused recommendations
  • 10-20: Standard recommendation lists
  • 20-50: For exploration and diversity

Configuration Tips

Dataset Size Considerations

  • Small (<10k interactions): Use alpha=0.3-0.4 (favor content)
  • Medium (10k-100k): Use alpha=0.5 (balanced)
  • Large (>100k): Use alpha=0.6-0.7 (favor CF)

Parameter Tuning Guidance

Adjust Alpha Based On:

  1. Data availability:

    • Sparse interactions -> Lower alpha (favor content)
    • Rich interactions -> Higher alpha (favor CF)
  2. Cold start frequency:

    • Many new items -> Lower alpha (content handles new items)
    • Stable catalog -> Higher alpha
  3. Content quality:

    • Rich descriptions -> Lower alpha (leverage content)
    • Poor content -> Higher alpha (rely on CF)
  4. Business goals:

    • Discovery/exploration -> Higher alpha (CF finds new patterns)
    • Relevance/similarity -> Lower alpha (content ensures fit)

Optimization Process:

  1. Start with alpha=0.5 (balanced)
  2. Evaluate Precision@K, NDCG, and Coverage
  3. If cold start is poor -> Decrease alpha
  4. If recommendations too predictable -> Increase alpha
  5. A/B test different alpha values in production

When to Choose This Over Alternatives

  • vs. Pure CF: Choose this for better cold start handling
  • vs. Pure Content-Based: Choose this for better discovery and pattern recognition
  • vs. Matrix Factorization: Choose this for more control over CF/content balance
  • vs. Embeddings: Choose this for interpretability and simpler implementation
  • Best when: You have both interaction data AND item descriptions

Common Issues and Solutions

Imbalanced Components

Issue: One component dominates, other adds little value. Solution:

  • Check individual component performance separately
  • Normalize scores before combining
  • Adjust alpha to balance contributions
  • Ensure both data sources are high quality

Cold Start Still Poor

Issue: New items still get poor recommendations despite content component. Solution:

  • Decrease alpha (favor content more, try 0.3)
  • Improve content quality and richness
  • Implement pure content-based fallback for items with zero interactions
  • Collect initial interactions through featured placement

Recommendations Too Conservative

Issue: Only recommending safe, obvious items. Solution:

  • Increase alpha (favor CF for discovery)
  • Apply diversity post-processing
  • Add exploration bonus for less-popular items
  • Monitor and balance novelty vs. relevance

Slow Performance

Issue: Hybrid model too slow for real-time recommendations. Solution:

  • Pre-compute both CF and content similarities
  • Cache user profiles
  • Use approximate methods
  • Consider separate models for cold start vs. established users

Difficult to Tune

Issue: Hard to find optimal alpha value. Solution:

  • Use cross-validation to test alpha range (0.3, 0.5, 0.7)
  • Monitor multiple metrics (Precision@K, Coverage, Diversity)
  • Consider adaptive alpha based on item age or interaction count
  • A/B test in production

Conflicting Recommendations

Issue: CF and content suggest very different items. Solution:

  • Check data quality in both sources
  • Ensure proper normalization of scores
  • Consider using max or rank aggregation instead of weighted average
  • Investigate cases where they disagree (may reveal insights)

Example Use Cases

E-commerce Product Recommendations

Scenario: Online store with 100k products, 500k users, rich product descriptions Configuration:

  • Alpha: 0.6 (favor CF slightly for purchase patterns)
  • Content: product_title + description + category + brand
  • Top-10 recommendations Why: Established user base (CF) but frequent new products (content), balance discovery with relevance

Video Streaming Service

Scenario: Streaming platform with 50k videos, 2M users, detailed video metadata Configuration:

  • Alpha: 0.7 (favor CF for viewing patterns and trends)
  • Content: title + description + genre + cast + tags
  • Top-15 recommendations
  • Rating column: viewing duration (implicit rating) Why: Strong interaction data from viewing behavior, but new content arrives regularly

Job Board Matching

Scenario: Job platform with 200k job postings, 1M job seekers, detailed job descriptions Configuration:

  • Alpha: 0.4 (favor content for skills and requirements matching)
  • Content: job_title + description + skills + requirements + location
  • Top-20 recommendations
  • Limited interaction data (users apply to few jobs) Why: Sparse interaction data but rich job descriptions, need accurate skills matching

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items