Item-Based Collaborative Filtering (KNN)
Item-based collaborative filtering using cosine similarity on user-item interaction matrix. Recommends items similar to what the user has interacted with.
Item-based collaborative filtering using cosine similarity on user-item interaction matrix. Recommends items similar to what the user has interacted with.
When to use:
- Need explainable recommendations ("Because you liked X")
- Have sufficient item interactions
- Items change less frequently than users
- Want stable recommendations
Strengths: Explainable, stable over time, scales well, handles sparsity reasonably Weaknesses: Cannot discover novel patterns, popularity bias, cold start for new items
How it Works
Item-Based KNN computes similarity between items based on users who interacted with them. Items are considered similar if they were liked/purchased/viewed by the same users.
For each user, the algorithm:
- Identifies items the user has interacted with
- Finds similar items using pre-computed item-item similarities
- Ranks candidate items by aggregated similarity scores
- Returns top-K recommendations
Key Concept: "Users who liked item A also liked item B" - recommendations are based on item co-occurrence patterns.
Parameters
Feature Configuration
Feature Columns (required) List of columns to use: must include user_id and item_id.
User Column (default: "user_id", required) Name of the column containing user identifiers. Each unique value represents a different user.
Item Column (default: "item_id", required) Name of the column containing item identifiers. Each unique value represents a different item to recommend.
Rating Column (optional) Name of the column containing ratings. If provided, uses weighted similarity. If not provided, treats all interactions equally.
Model-Specific Parameters
Top-K Recommendations (default: 10) Number of items to recommend for each user.
- 5-10: Focused, high-confidence recommendations
- 10-20: Standard recommendation lists
- 20-50: For exploration and serendipity
Configuration Tips
Dataset Size Considerations
- Small (<1k items): Works well, but limited recommendation diversity
- Medium (1k-10k items): Ideal range, good balance of coverage and performance
- Large (10k-100k items): Good performance, but similarity computation is expensive
- Very Large (>100k items): Consider using approximation techniques or Matrix Factorization
Parameter Tuning Guidance
- Monitor coverage: Ensure recommendations aren't dominated by popular items
- Check diversity: Use diversity metrics to avoid filter bubbles
- Validate explainability: Recommendations should make intuitive sense
- Track novelty: Balance between safe recommendations and discovery
- A/B test: Offline metrics don't always match online engagement
When to Choose This Over Alternatives
- vs. User-Based KNN: Choose this for more stable recommendations (items change less than users)
- vs. Matrix Factorization: Choose this for explainability and when you don't need rating prediction
- vs. Content-Based: Choose this when you have sufficient interaction data
- vs. BERT4Rec: Choose this for non-sequential, simpler recommendations
- vs. Association Rules: Choose this for personalized recommendations vs. general patterns
Common Issues and Solutions
Cold Start Problem (New Items)
Issue: Cannot recommend new items with no interaction history. Solution:
- Use content-based features for new items (Hybrid model)
- Implement "exploration" strategy to gather initial interactions
- Bootstrap with item metadata similarity
- Show new items to diverse users initially
Cold Start Problem (New Users)
Issue: No interaction history to base recommendations on. Solution:
- Show popular items initially
- Collect quick preferences through onboarding
- Use demographic or contextual signals
- Switch to content-based until sufficient interactions
Popularity Bias
Issue: Only popular items get recommended. Solution:
- Apply inverse frequency weighting
- Use diversity-aware reranking
- Set minimum interaction threshold
- Balance popularity with personalization
Limited Diversity
Issue: All recommendations too similar to each other. Solution:
- Use diversity-aware selection (e.g., MMR - Maximal Marginal Relevance)
- Filter out overly similar items
- Combine with other recommendation signals
- Apply category diversification
Scalability Issues
Issue: Computing item similarities is expensive with many items. Solution:
- Pre-compute and cache similarities
- Use approximate nearest neighbors
- Limit similarity computation to top-N similar items per item
- Consider Matrix Factorization for very large catalogs
Poor Coverage
Issue: Many items never get recommended. Solution:
- Lower similarity thresholds
- Boost less popular items
- Use exploration strategies
- Combine with other recommendation methods
Example Use Cases
E-commerce Product Recommendations
Scenario: Online retailer with 50k products, 100k users Configuration:
- Top-10 recommendations
- Use purchase history as implicit feedback
- No rating column Why: Explainable recommendations ("Customers who bought this also bought..."), stable product catalog
Movie Recommendations
Scenario: Streaming service with 10k movies, 500k users Configuration:
- Top-15 recommendations
- Use viewing history
- Optional rating weights for explicit feedback Why: Movies don't change, users want similar content, needs explainability
News Article Recommendations
Scenario: News platform with 1M articles, 2M users Configuration:
- Top-20 recommendations
- Use read history (implicit feedback)
- Recent interactions weighted more Why: While articles are numerous, can focus on recent articles, users who read similar articles have similar interests