Recommendation systems drive billions in revenue for platforms like Netflix, Amazon, and Spotify by predicting what users want before they search. This chapter revolutionizes recommendations with embeddings: collaborative filtering using learned user and item embeddings that scale to billions of users and items, cold start solutions that leverage content embeddings and meta-learning to recommend for new users and products, real-time personalization with streaming embeddings that adapt to user behavior within seconds, diversity and fairness constraints that prevent filter bubbles and ensure equitable exposure, and cross-domain recommendation transfer that leverages learned representations across product categories and platforms. These techniques transform recommendations from simple popularity rankings to sophisticated personalization engines that understand nuanced preferences at trillion-row scale.
After mastering semantic search across modalities (Chapter 12), the next application is recommendation systems—the engines that power discovery on every major platform. Traditional collaborative filtering (matrix factorization, nearest neighbors) scales poorly beyond millions of users and items, struggles with cold start problems, and requires expensive retraining for updates. Embedding-based recommendations solve these challenges by learning dense vector representations of users and items in a shared latent space, enabling efficient similarity search, transfer learning across domains, and real-time personalization through incremental embedding updates.
13.1 Embedding-Based Collaborative Filtering
Collaborative filtering predicts user preferences from historical interactions (clicks, purchases, ratings). Embedding-based collaborative filtering learns vector representations where users and items close in embedding space have similar preferences, enabling recommendations via nearest neighbor search at billion-user scale.
13.1.1 The Collaborative Filtering Challenge
Traditional collaborative filtering approaches have limitations:
Matrix factorization (SVD, ALS): Expensive to retrain (hours), doesn’t scale to billions, cold start unsolved
Deep learning (Neural CF): Better accuracy but requires careful architecture design
Embedding-based approach: Learn user embeddings u ∈ ℝᵈ and item embeddings i ∈ ℝᵈ such that relevance score = u · i (dot product). High score = likely interaction.
Fairness metrics: Monitor exposure distribution across items
13.2 Cold Start Problem Solutions
The cold start problem occurs when new users or items have no interaction history, making collaborative filtering impossible. Cold start solutions leverage content embeddings, meta-learning, and transfer learning to provide quality recommendations from the first interaction.
13.2.1 The Cold Start Challenge
Three cold start scenarios:
New user: No interaction history → cannot estimate preferences
New item: No user interactions → cannot estimate quality
New system: No users or items → cannot learn patterns
Traditional approaches fail:
Collaborative filtering: Requires interaction history
Content-based: Ignores collaborative signal
Popularity-based: Ignores user preferences
Show Cold Start Solution
import torch.nn as nnclass ColdStartRecommender(nn.Module):"""Hybrid recommender for cold start using content and collaborative signals."""def__init__(self, num_users: int, num_items: int, content_dim: int=256, embedding_dim: int=128):super().__init__()# Collaborative embeddingsself.user_embedding = nn.Embedding(num_users, embedding_dim)self.item_embedding = nn.Embedding(num_items, embedding_dim)# Content encoder for cold startself.content_encoder = nn.Sequential( nn.Linear(content_dim, 256), nn.ReLU(), nn.Dropout(0.2), nn.Linear(256, embedding_dim) )def forward(self, user_ids, item_ids, item_features=None, use_content=False):"""Score user-item pairs with optional content fallback.""" user_emb =self.user_embedding(user_ids)# Use content encoder for cold start itemsif use_content and item_features isnotNone: item_emb =self.content_encoder(item_features)else: item_emb =self.item_embedding(item_ids)# Normalize and score user_emb = F.normalize(user_emb, p=2, dim=1) item_emb = F.normalize(item_emb, p=2, dim=1) scores = (user_emb * item_emb).sum(dim=1)return scores# Usage examplemodel = ColdStartRecommender(num_users=10000, num_items=5000, content_dim=256)# For new items without interactions, use content featuresimport torchnew_item_features = torch.randn(1, 256)user_id = torch.tensor([42])item_id = torch.tensor([0])score = model(user_id, item_id, new_item_features, use_content=True)print(f"Cold start score: {score.item():.3f}")
Cold start score: -0.003
TipCold Start Best Practices
Content-based initialization:
Feature quality: High-quality content features are critical
Pre-training: Pre-train content encoder on external data
Fine-tuning: Fine-tune on collaborative signal when available (see Chapter 14 for guidance on choosing the right level of customization)
The code above shows a content_encoder that maps item features to embeddings—but where do the encoder’s weights come from initially?
Option 1: Pre-trained Foundation Models (recommended)
Leverage existing pre-trained models matched to your content type:
Content Type
Pre-trained Model
Output Dim
Text (titles, descriptions)
Sentence-BERT, E5, BGE
384-1024
Images
CLIP, ViT, ResNet
512-2048
Audio
CLAP, Wav2Vec
512-768
Structured metadata
TabNet, FT-Transformer
64-256
# Example: Use sentence-transformers for text contentfrom sentence_transformers import SentenceTransformertext_encoder = SentenceTransformer('all-MiniLM-L6-v2')item_description ="Wireless bluetooth headphones with noise cancellation"content_embedding = text_encoder.encode(item_description) # 384-dim vector
Option 2: Train from Scratch (when domain-specific)
If your content is highly specialized (e.g., patent claims, chemical structures):
Collect content pairs: Items that users interact with together are “similar”
Contrastive pre-training: Train encoder so co-interacted items have similar embeddings
Minimum data: ~10K items with content, ~100K interactions
Option 3: Hybrid Initialization
Start with pre-trained, fine-tune on your domain:
Initialize from pre-trained model
Freeze base layers, train projection head on your collaborative signal
Gradually unfreeze layers as you collect more data
When to transition from content to collaborative?
Interactions per Item
Strategy
0
Pure content-based
1-10
80% content, 20% collaborative
10-50
50% content, 50% collaborative
50+
20% content, 80% collaborative
Monitor recommendation quality (click-through rate, conversion) as you adjust the blend.
13.3 Real-Time Personalization
Traditional recommendation systems update daily or weekly, missing real-time behavior changes. Real-time personalization continuously updates user embeddings from streaming interactions, adapting recommendations within seconds to reflect evolving preferences and context.
13.3.1 The Real-Time Challenge
User preferences change:
Session context: User browsing for gifts has different intent than personal shopping
Temporal trends: User interested in Christmas movies in December, not July
Sequential patterns: User watching action trilogy wants next episode, not random movie
Real-time feedback: User skips recommendations → adjust immediately
Challenge: Update user embeddings in real-time without expensive model retraining.
Show Real-Time Session Encoder
from typing import Listimport torchimport torch.nn as nnimport torch.nn.functional as Fclass SessionRecommender(nn.Module):"""Real-time personalization using session history."""def__init__(self, num_items: int, embedding_dim: int=128, hidden_dim: int=256):super().__init__()self.item_embedding = nn.Embedding(num_items, embedding_dim)# Session encoder (LSTM)self.session_encoder = nn.LSTM( input_size=embedding_dim, hidden_size=hidden_dim, num_layers=2, batch_first=True, dropout=0.2 )# Output projectionself.projection = nn.Linear(hidden_dim, embedding_dim)def encode_session(self, session_item_ids: torch.Tensor):"""Encode user session to embedding."""# Embed session items item_embs =self.item_embedding(session_item_ids)# LSTM encoding _, (hidden, _) =self.session_encoder(item_embs)# Project to item space session_emb =self.projection(hidden[-1])return F.normalize(session_emb, p=2, dim=1)def recommend_next(self, session_item_ids: torch.Tensor, k: int=10):"""Recommend next items based on session history."""# Encode session session_emb =self.encode_session(session_item_ids)# Get all item embeddings all_items = torch.arange(self.item_embedding.num_embeddings).to(session_item_ids.device) item_embs = F.normalize(self.item_embedding(all_items), p=2, dim=1)# Compute scores scores = torch.matmul(item_embs, session_emb.T).squeeze()# Return top-k top_scores, top_indices = torch.topk(scores, k)return all_items[top_indices], top_scores# Usage examplemodel = SessionRecommender(num_items=5000, embedding_dim=128)session = torch.tensor([[10, 25, 42, 100]]) # User browsing historyrecommended_items, scores = model.recommend_next(session, k=5)print(f"Next item recommendations: {recommended_items.tolist()}")
Next item recommendations: [3759, 2846, 1675, 3444, 4042]
TipReal-Time Personalization Best Practices
Architecture:
Streaming infrastructure: Kafka/Kinesis for event ingestion
Session state: Redis/Memcached for fast session access
Incremental updates: Update embeddings without full recomputation
Cache strategy: Invalidate user cache on interactions
Hybrid fusion: Base (long-term) + session (short-term) embeddings
Performance:
Latency target: p95 < 100ms for embedding computation
Throughput: 10K+ updates/second per node
Batching: Micro-batch events for GPU efficiency
Fallback: Serve base embedding if session computation times out
13.4 Diversity and Fairness in Recommendations
Purely accuracy-optimized recommenders create filter bubbles: users see only items similar to past behavior, reducing diversity and creating unfair exposure for long-tail items. Diversity and fairness constraints ensure recommendations span categories, promote exploration, and provide equitable exposure.
13.4.1 The Diversity Challenge
Accuracy-optimized systems suffer from:
Filter bubbles: Users trapped in narrow content silos
Popularity bias: Popular items recommended excessively
Homogeneity: All recommendations similar to each other
Unfair exposure: Long-tail items never discovered
Goal: Balance accuracy, diversity, and fairness.
Show Diversity-Aware Ranking
import torchimport numpy as npclass DiversityReranker:"""MMR-based reranking for diversity."""def__init__(self, lambda_param: float=0.3):self.lambda_param = lambda_param # Balance relevance vs diversitydef rerank(self, query_emb, candidate_embs, candidate_scores, k: int=10):"""Rerank candidates using Maximal Marginal Relevance (MMR). MMR selects items that are relevant to query but diverse from each other. """ selected = [] selected_embs = [] remaining_indices =list(range(len(candidate_embs)))for _ inrange(min(k, len(candidate_embs))): mmr_scores = []for idx in remaining_indices:# Relevance score relevance = candidate_scores[idx]# Diversity penalty (max similarity to selected items)if selected_embs: similarities = [torch.dot(candidate_embs[idx], s)for s in selected_embs] diversity_penalty =max(similarities)else: diversity_penalty =0.0# MMR score mmr =self.lambda_param * relevance - (1-self.lambda_param) * diversity_penalty mmr_scores.append((idx, mmr))# Select item with highest MMR score best_idx, best_score =max(mmr_scores, key=lambda x: x[1]) selected.append(best_idx) selected_embs.append(candidate_embs[best_idx]) remaining_indices.remove(best_idx)return selected# Usage examplereranker = DiversityReranker(lambda_param=0.3)query = torch.randn(128)candidates = torch.randn(50, 128)scores = torch.rand(50)diverse_ranking = reranker.rerank(query, candidates, scores, k=10)print(f"Diverse top-10 ranking: {diverse_ranking}")
Long-term metrics: Optimize for session success, not click-through rate
A/B testing: Measure impact on retention and lifetime value
13.5 Cross-Domain Recommendation Transfer
Users interact across multiple domains (products, movies, music), but traditional systems treat each domain independently. Cross-domain recommendation transfer leverages learned embeddings to transfer knowledge across domains, enabling better cold start and improved recommendations in data-sparse domains.
13.5.1 The Cross-Domain Challenge
Challenges of multi-domain systems:
Data sparsity: Some domains have limited interactions (e.g., luxury goods)
Cold start: New domain with no historical data
Shared preferences: User preferences correlate across domains (action movies → action games)
Different scales: Domains have different numbers of items and interaction frequencies
Opportunity: Transfer learning from data-rich to data-sparse domains.
Show Cross-Domain Recommender
import torchimport torch.nn as nnimport torch.nn.functional as Ffrom typing import Dictclass CrossDomainRecommender(nn.Module):"""Multi-domain recommender with shared user embeddings."""def__init__(self, embedding_dim: int=128, num_users: int=1000000, num_items_per_domain: Dict[str, int] =None):super().__init__()self.embedding_dim = embedding_dim# Shared user encoder across domainsself.user_encoder = nn.Embedding(num_users, embedding_dim)# Domain-specific item encodersself.item_encoders = nn.ModuleDict()for domain, num_items in num_items_per_domain.items():self.item_encoders[domain] = nn.Embedding(num_items, embedding_dim)self.domains =list(num_items_per_domain.keys())def forward(self, user_ids: torch.Tensor, item_ids: torch.Tensor, domain: str):"""Predict scores for user-item pairs in given domain."""# Encode users (shared across domains) user_emb = F.normalize(self.user_encoder(user_ids), p=2, dim=1)# Encode items (domain-specific) item_emb = F.normalize(self.item_encoders[domain](item_ids), p=2, dim=1)# Dot product scoring scores = (user_emb * item_emb).sum(dim=1)return scoresdef recommend_cross_domain(self, user_id: int, domain: str, k: int=10):"""Recommend items from specific domain.""" user_tensor = torch.tensor([user_id]) user_emb = F.normalize(self.user_encoder(user_tensor), p=2, dim=1)# Get all items in domain num_items =self.item_encoders[domain].num_embeddings all_items = torch.arange(num_items) item_embs = F.normalize(self.item_encoders[domain](all_items), p=2, dim=1)# Compute scores scores = torch.matmul(item_embs, user_emb.T).squeeze()# Top-k top_scores, top_indices = torch.topk(scores, k)return all_items[top_indices], top_scores# Usage examplemodel = CrossDomainRecommender( embedding_dim=64, num_users=1000, num_items_per_domain={'movies': 10000, 'books': 5000})# Recommend books based on movie preferences (shared user embedding)recommended_books, scores = model.recommend_cross_domain(user_id=42, domain='books', k=5)print(f"Cross-domain recommendations (movies → books): {recommended_books.tolist()}")
Shared user encoder: Single embedding space for users across domains
Domain-specific item encoders: Separate embeddings per domain
Domain bridges: Learn mappings between domain embeddings
Multi-task learning: Joint optimization with domain-specific losses
Transfer strategies: (see Chapter 14 for a detailed decision framework)
Pre-train + fine-tune: Train on rich domain, fine-tune on sparse
Freeze encoder: Transfer user encoder, train only item encoder
Gradual unfreezing: Progressively unfreeze layers during fine-tuning
Regularization: L2 penalty to keep close to source weights
Evaluation:
Cross-domain metrics: Measure improvement in sparse domain
Cold start impact: Test on new users/items in sparse domain
Transfer quality: Correlation between domain preferences
Negative transfer: Monitor for cases where transfer hurts performance
13.6 Key Takeaways
Embedding-based collaborative filtering scales to billions of users and items: Two-tower architecture with separate user and item encoders enables independent updates, fast serving via ANN search, and efficient training with negative sampling
Cold start solutions leverage content and meta-learning: Content-based initialization provides embeddings for new items from features, meta-learning (MAML) enables adaptation from 1-5 interactions, and hybrid models smoothly transition from content to collaborative signals
Real-time personalization adapts recommendations within seconds: Session embeddings computed from recent interactions combine with base embeddings to reflect current intent, with streaming architectures enabling sub-100ms latency for embedding updates
Diversity and fairness prevent filter bubbles and ensure equitable exposure: MMR (Maximal Marginal Relevance) balances accuracy and diversity, calibrated recommendations match user preference distributions, and fairness monitoring tracks coverage and inequality via Gini coefficients
Cross-domain transfer leverages shared user preferences: Shared user encoders across domains enable knowledge transfer, pre-training on rich domains improves sparse domains, and multi-task learning jointly optimizes across product categories
Production recommenders require careful trade-off management: Accuracy vs diversity, short-term clicks vs long-term engagement, popularity vs fairness, and collaborative vs content signals all require tuning based on business objectives and user research
Embedding dimensionality impacts both quality and cost: 64-128 dims sufficient for most applications, 256-512 dims for complex domains (fashion, media), with higher dimensions improving accuracy but increasing storage (10TB for 100M items at 512-dim float32) and latency
13.7 Looking Ahead
Part V (Industry Applications) begins with Chapter 26, which covers security and automation patterns that apply across all industries: cybersecurity threat hunting, behavioral anomaly detection, and embedding-driven business rules. These cross-cutting concerns form the foundation for the industry-specific chapters that follow.
13.8 Further Reading
13.8.1 Collaborative Filtering
Koren, Yehuda, Robert Bell, and Chris Volinsky (2009). “Matrix Factorization Techniques for Recommender Systems.” IEEE Computer.
He, Xiangnan, et al. (2017). “Neural Collaborative Filtering.” WWW.
Rendle, Steffen, et al. (2012). “BPR: Bayesian Personalized Ranking from Implicit Feedback.” UAI.
Yi, Xinyang, et al. (2019). “Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations.” RecSys.
13.8.2 Cold Start and Meta-Learning
Finn, Chelsea, Pieter Abbeel, and Sergey Levine (2017). “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.” ICML.
Vartak, Manasi, et al. (2017). “Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation.” RecSys.
Bharadhwaj, Homanga, et al. (2019). “Meta-Learning for User Cold-Start Recommendation.” IJCNN.
Lee, Hoyeop, et al. (2019). “MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation.” KDD.
13.8.3 Real-Time Personalization
Hidasi, Balázs, et al. (2016). “Session-based Recommendations with Recurrent Neural Networks.” ICLR.
Li, Jing, et al. (2017). “Neural Attentive Session-based Recommendation.” CIKM.
Quadrana, Massimo, et al. (2017). “Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks.” RecSys.
Wu, Chuhan, et al. (2019). “Session-based Recommendation with Graph Neural Networks.” AAAI.
13.8.4 Diversity and Fairness
Carbonell, Jaime, and Jade Goldstein (1998). “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries.” SIGIR.
Steck, Harald (2018). “Calibrated Recommendations.” RecSys.
Abdollahpouri, Himan, et al. (2019). “Managing Popularity Bias in Recommender Systems with Personalized Re-ranking.” FLAIRS.
Mehrotra, Rishabh, et al. (2018). “Towards a Fair Marketplace: Counterfactual Evaluation of the Trade-off Between Relevance, Fairness & Satisfaction in Recommendation Systems.” CIKM.
13.8.5 Cross-Domain Recommendations
Fernández-Tobías, Ignacio, et al. (2016). “Cross-domain Recommender Systems: A Survey of the State of the Art.” UMAP.
Hu, Guangneng, Yu Zhang, and Qiang Yang (2018). “CoNet: Collaborative Cross Networks for Cross-Domain Recommendation.” CIKM.
Zhu, Feng, et al. (2021). “Transfer-Meta Framework for Cross-domain Recommendation to Cold-Start Users.” SIGIR.
Man, Tong, et al. (2017). “Cross-Domain Recommendation: An Embedding and Mapping Approach.” IJCAI.