User-Based Collaborative Filtering (KNN)
User-based collaborative filtering using user-user similarity. Recommends items that similar users have liked.
User-based collaborative filtering using user-user similarity. Recommends items that similar users have liked.
When to use:
- Have stable user base
- Users have clear preferences
- Need serendipity (discovering new types of items)
- Good for smaller datasets
Strengths: Good for discovery, captures user preferences holistically, explainable ("Users like you also liked") Weaknesses: Less stable than item-based, doesn't scale as well, cold start for new users
How it Works
User-Based KNN computes similarity between users based on their interaction patterns. Users are considered similar if they liked/purchased/viewed similar items.
For each user, the algorithm:
- Finds the most similar users (neighbors) using cosine similarity
- Identifies items those similar users liked but the target user hasn't interacted with
- Ranks candidates by aggregated similarity scores from neighbors
- Returns top-K recommendations
Key Concept: "Users with similar tastes to yours also enjoyed these items" - recommendations leverage wisdom of similar users.
Parameters
Feature Configuration
Feature Columns (required) List of columns to use: must include user_id and item_id.
User Column (default: "user_id", required) Name of the column containing user identifiers. Each unique value represents a different user.
Item Column (default: "item_id", required) Name of the column containing item identifiers. Each unique value represents a different item to recommend.
Rating Column (optional) Name of the column containing ratings. If provided, uses weighted similarity. If not provided, treats all interactions equally.
Model-Specific Parameters
Top-K Recommendations (default: 10) Number of items to recommend for each user.
- 5-10: Focused recommendations
- 10-20: Standard recommendation lists
- 20-50: For broad exploration
Number of Neighbors (default: 20) Number of similar users to consider for generating recommendations.
- 5-10: Very focused, only most similar users
- 10-20: Good balance (default)
- 20-50: Broader perspective, more diversity
- 50+: May include less relevant users
Configuration Tips
Dataset Size Considerations
- Small (<10k users): Works well, ideal use case
- Medium (10k-100k users): Acceptable performance, consider item-based instead
- Large (100k-1M users): Performance issues, use Matrix Factorization
- Very Large (>1M users): Not recommended, too slow
Parameter Tuning Guidance
- Adjust neighbors: More neighbors = more diversity, fewer = more precision
- Monitor stability: User preferences change, may need frequent retraining
- Balance similarity: Too strict = few recommendations, too loose = poor quality
- Track serendipity: Measure how often novel items are recommended
- Optimize performance: Pre-compute similarities, use approximate methods for large datasets
When to Choose This Over Alternatives
- vs. Item-Based KNN: Choose this for better discovery and serendipity
- vs. Matrix Factorization: Choose this for smaller datasets and explainability
- vs. Content-Based: Choose this when you have sufficient user interaction data
- vs. BERT4Rec: Choose this for simpler, non-sequential recommendations
- vs. Hybrid: Choose this when you don't have item content features
Common Issues and Solutions
Cold Start Problem (New Users)
Issue: Cannot recommend to users with no interaction history. Solution:
- Use demographic similarity if available
- Show popular items initially
- Collect initial preferences through questionnaire
- Fall back to content-based recommendations
Scalability Issues
Issue: Computing user-user similarities is expensive with many users. Solution:
- Use sampling (compute similarities for subset of users)
- Pre-compute and cache similarities
- Use approximate nearest neighbors algorithms
- Switch to Item-Based KNN (more scalable)
- Consider Matrix Factorization for large datasets
Unstable Recommendations
Issue: Recommendations change frequently as user behavior updates. Solution:
- Use more neighbors for stability
- Weight recent interactions higher
- Apply smoothing to similarity scores
- Consider Item-Based KNN (more stable)
Sparsity Issues
Issue: Users with few interactions get poor recommendations. Solution:
- Lower minimum similarity threshold
- Increase number of neighbors
- Combine with content-based features
- Use Matrix Factorization which handles sparsity better
Privacy Concerns
Issue: Recommendations reveal information about other users' preferences. Solution:
- Aggregate neighbor preferences without revealing individuals
- Use item-based approach instead (doesn't expose user similarity)
- Apply differential privacy techniques
- Use Matrix Factorization with encoded representations
Popularity Bias
Issue: Recommendations dominated by popular items liked by many neighbors. Solution:
- Apply inverse frequency weighting
- Use diversity-aware ranking
- Balance popular and niche recommendations
- Consider user's unique tastes in scoring
Example Use Cases
Small Community Platform
Scenario: Niche interest platform with 5k active users, 10k items Configuration:
- 15 neighbors
- Top-10 recommendations
- Use interaction history (views, likes, comments) Why: Small user base, strong community with similar interests, explainability valued
Music Discovery Service
Scenario: Music streaming app with 50k users discovering new artists Configuration:
- 25 neighbors (broader discovery)
- Top-20 recommendations
- Weight recent listens more heavily Why: Good for discovering new music through similar users, emphasizes serendipity
Book Recommendations
Scenario: Online bookstore with 30k regular readers, 100k books Configuration:
- 20 neighbors
- Top-15 recommendations
- Use purchase and rating history Why: Book preferences are personal and nuanced, readers trust recommendations from similar readers, explainable