Neural Collaborative Filtering - DL for Recommendation Systems
Neural Collaborative Filtering (NCF) represents a paradigm shift in recommendation systems by replacing traditional matrix factorization techniques with deep neural networks. This approach, introduced in the paper "Neural Collaborative Filtering" (He et al., 2017), leverages the power of deep learning to model complex user-item interactions more effectively.
Introduction to Recommendation Systems
Recommendation systems are everywhere in modern applications - from Netflix suggesting movies to Amazon recommending products. These systems aim to predict user preferences and suggest items that users are likely to interact with.
Traditional recommendation approaches include:
- Content-Based Filtering: Recommends items similar to those a user has liked before
- Collaborative Filtering: Leverages patterns from multiple users to make recommendations
- Hybrid Methods: Combine multiple approaches for better results
Collaborative filtering is based on the idea that users who agreed in the past tend to agree in the future. If User A and User B both liked items 1, 2, and 3, they will likely agree on item 4.
Traditional Matrix Factorization
Before deep learning, Matrix Factorization (MF) was the dominant approach for collaborative filtering. The idea is simple yet powerful:
The Core Concept
Represent users and items as latent feature vectors in a shared embedding space. The interaction between user u and item i is modeled as the inner product of their latent vectors:
ŷᵤᵢ = pᵤᵀ · qᵢ
Where:
pᵤis the latent vector for useruqᵢis the latent vector for itemiŷᵤᵢis the predicted interaction score
Limitations of Matrix Factorization
- Linear Assumption: The inner product is a linear operation, limiting its ability to capture complex user-item interactions
- Fixed Interaction Function: Cannot learn more sophisticated interaction patterns
- Limited Expressiveness: May not adequately represent the complexity of user preferences
# Traditional Matrix Factorization example
import numpy as np
class MatrixFactorization:
def __init__(self, n_users, n_items, n_factors=20):
self.user_factors = np.random.normal(0, 0.1, (n_users, n_factors))
self.item_factors = np.random.normal(0, 0.1, (n_items, n_factors))
def predict(self, user_id, item_id):
# Simple dot product - linear interaction
return np.dot(self.user_factors[user_id], self.item_factors[item_id])
Neural Collaborative Filtering (NCF)
NCF addresses the limitations of matrix factorization by using neural networks to learn the interaction function from data, rather than assuming a fixed inner product.
The NCF Framework
The NCF framework consists of three main components:
- Embedding Layer: Maps sparse user and item IDs to dense vectors
- Neural CF Layers: Learn the interaction function between user and item embeddings
- Output Layer: Predicts the interaction score
import torch
import torch.nn as nn
class NCF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32, hidden_layers=[64, 32, 16]):
super(NCF, self).__init__()
# Embedding layers
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
# Neural CF layers
layers = []
input_dim = embedding_dim * 2
for hidden_dim in hidden_layers:
layers.append(nn.Linear(input_dim, hidden_dim))
layers.append(nn.ReLU())
layers.append(nn.Dropout(0.2))
input_dim = hidden_dim
self.fc_layers = nn.Sequential(*layers)
# Output layer
self.output_layer = nn.Linear(hidden_layers[-1], 1)
self.sigmoid = nn.Sigmoid()
def forward(self, user_ids, item_ids):
# Get embeddings
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)
# Concatenate embeddings
x = torch.cat([user_embed, item_embed], dim=-1)
# Pass through neural layers
x = self.fc_layers(x)
# Get prediction
output = self.sigmoid(self.output_layer(x))
return output.squeeze()
Architecture Variants
The original NCF paper proposed three specific models:
1. Generalized Matrix Factorization (GMF)
GMF generalizes traditional matrix factorization by allowing element-wise product with learnable weights:
class GMF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32):
super(GMF, self).__init__()
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
self.output_layer = nn.Linear(embedding_dim, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)
# Element-wise product
interaction = user_embed * item_embed
output = self.sigmoid(self.output_layer(interaction))
return output.squeeze()
2. Multi-Layer Perceptron (MLP)
MLP uses multiple neural layers to learn the interaction function:
class MLP(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32, layers=[64, 32, 16, 8]):
super(MLP, self).__init__()
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
mlp_modules = []
input_size = embedding_dim * 2
for layer_size in layers:
mlp_modules.append(nn.Linear(input_size, layer_size))
mlp_modules.append(nn.ReLU())
mlp_modules.append(nn.BatchNorm1d(layer_size))
mlp_modules.append(nn.Dropout(0.2))
input_size = layer_size
self.mlp_layers = nn.Sequential(*mlp_modules)
self.output_layer = nn.Linear(layers[-1], 1)
self.sigmoid = nn.Sigmoid()
def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)
# Concatenate embeddings
x = torch.cat([user_embed, item_embed], dim=-1)
x = self.mlp_layers(x)
output = self.sigmoid(self.output_layer(x))
return output.squeeze()
3. Neural Matrix Factorization (NeuMF)
NeuMF combines GMF and MLP to leverage both linear and non-linear interactions:
class NeuMF(nn.Module):
def __init__(self, n_users, n_items, mf_dim=8, mlp_dim=32, layers=[64, 32, 16, 8]):
super(NeuMF, self).__init__()
# GMF embeddings
self.user_embedding_gmf = nn.Embedding(n_users, mf_dim)
self.item_embedding_gmf = nn.Embedding(n_items, mf_dim)
# MLP embeddings
self.user_embedding_mlp = nn.Embedding(n_users, mlp_dim)
self.item_embedding_mlp = nn.Embedding(n_items, mlp_dim)
# MLP layers
mlp_modules = []
input_size = mlp_dim * 2
for layer_size in layers:
mlp_modules.append(nn.Linear(input_size, layer_size))
mlp_modules.append(nn.ReLU())
mlp_modules.append(nn.BatchNorm1d(layer_size))
mlp_modules.append(nn.Dropout(0.2))
input_size = layer_size
self.mlp_layers = nn.Sequential(*mlp_modules)
# Final prediction layer
self.output_layer = nn.Linear(mf_dim + layers[-1], 1)
self.sigmoid = nn.Sigmoid()
def forward(self, user_ids, item_ids):
# GMF part
user_embed_gmf = self.user_embedding_gmf(user_ids)
item_embed_gmf = self.item_embedding_gmf(item_ids)
gmf_output = user_embed_gmf * item_embed_gmf
# MLP part
user_embed_mlp = self.user_embedding_mlp(user_ids)
item_embed_mlp = self.item_embedding_mlp(item_ids)
mlp_input = torch.cat([user_embed_mlp, item_embed_mlp], dim=-1)
mlp_output = self.mlp_layers(mlp_input)
# Concatenate GMF and MLP outputs
concat = torch.cat([gmf_output, mlp_output], dim=-1)
output = self.sigmoid(self.output_layer(concat))
return output.squeeze()
Training Neural Collaborative Filtering
Data Preparation
NCF typically works with implicit feedback (clicks, views, purchases) rather than explicit ratings:
import pandas as pd
from torch.utils.data import Dataset, DataLoader
class ImplicitFeedbackDataset(Dataset):
def __init__(self, interactions, n_users, n_items, negative_samples=4):
"""
Args:
interactions: DataFrame with columns [user_id, item_id]
n_users: Total number of users
n_items: Total number of items
negative_samples: Number of negative samples per positive sample
"""
self.interactions = interactions
self.n_users = n_users
self.n_items = n_items
self.negative_samples = negative_samples
# Create user-item interaction matrix for negative sampling
self.user_items = interactions.groupby('user_id')['item_id'].apply(set).to_dict()
def __len__(self):
return len(self.interactions) * (1 + self.negative_samples)
def __getitem__(self, idx):
# Determine if this is a positive or negative sample
pos_idx = idx // (1 + self.negative_samples)
is_positive = (idx % (1 + self.negative_samples)) == 0
user_id = self.interactions.iloc[pos_idx]['user_id']
if is_positive:
item_id = self.interactions.iloc[pos_idx]['item_id']
label = 1.0
else:
# Sample negative item (item user hasn't interacted with)
while True:
item_id = np.random.randint(0, self.n_items)
if item_id not in self.user_items.get(user_id, set()):
break
label = 0.0
return {
'user_id': torch.tensor(user_id, dtype=torch.long),
'item_id': torch.tensor(item_id, dtype=torch.long),
'label': torch.tensor(label, dtype=torch.float)
}
Loss Function
For implicit feedback, binary cross-entropy loss is commonly used:
def train_epoch(model, dataloader, optimizer, device):
model.train()
total_loss = 0
criterion = nn.BCELoss()
for batch in dataloader:
user_ids = batch['user_id'].to(device)
item_ids = batch['item_id'].to(device)
labels = batch['label'].to(device)
# Forward pass
predictions = model(user_ids, item_ids)
loss = criterion(predictions, labels)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
return total_loss / len(dataloader)
Evaluation Metrics
Common metrics for recommendation systems include:
- Hit Rate (HR@K): Percentage of test cases where the true item appears in top-K recommendations
- Normalized Discounted Cumulative Gain (NDCG@K): Measures ranking quality with position-based discounting
def evaluate_model(model, test_data, k=10):
"""
Evaluate model using Hit Rate and NDCG
"""
model.eval()
hits = []
ndcgs = []
with torch.no_grad():
for user_id, true_item, candidates in test_data:
# Get predictions for all candidate items
user_ids = torch.tensor([user_id] * len(candidates))
item_ids = torch.tensor(candidates)
predictions = model(user_ids, item_ids)
# Get top-K recommendations
_, top_k_indices = torch.topk(predictions, k)
recommended_items = [candidates[i] for i in top_k_indices]
# Calculate Hit Rate
hit = 1.0 if true_item in recommended_items else 0.0
hits.append(hit)
# Calculate NDCG
if true_item in recommended_items:
rank = recommended_items.index(true_item) + 1
ndcg = 1.0 / np.log2(rank + 1)
else:
ndcg = 0.0
ndcgs.append(ndcg)
return {
'HR@{}'.format(k): np.mean(hits),
'NDCG@{}'.format(k): np.mean(ndcgs)
}
Complete Training Pipeline
Here's a complete example of training an NCF model:
import torch
import torch.optim as optim
from torch.utils.data import DataLoader
# Hyperparameters
n_users = 1000
n_items = 2000
embedding_dim = 32
hidden_layers = [64, 32, 16, 8]
learning_rate = 0.001
batch_size = 256
n_epochs = 20
# Initialize model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = NeuMF(n_users, n_items, mf_dim=8, mlp_dim=32, layers=hidden_layers).to(device)
# Optimizer
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
# Learning rate scheduler
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=3, verbose=True
)
# Training loop
best_hr = 0
for epoch in range(n_epochs):
# Train
train_loss = train_epoch(model, train_loader, optimizer, device)
# Evaluate
metrics = evaluate_model(model, test_data, k=10)
# Update learning rate
scheduler.step(train_loss)
# Print progress
print(f"Epoch {epoch+1}/{n_epochs}")
print(f" Train Loss: {train_loss:.4f}")
print(f" HR@10: {metrics['HR@10']:.4f}")
print(f" NDCG@10: {metrics['NDCG@10']:.4f}")
# Save best model
if metrics['HR@10'] > best_hr:
best_hr = metrics['HR@10']
torch.save(model.state_dict(), 'best_ncf_model.pth')
print(f" New best model saved!")
Advanced Techniques
1. Pre-training Strategy
The original NCF paper suggests pre-training GMF and MLP separately, then using their weights to initialize NeuMF:
def pretrain_neumf(gmf_model, mlp_model, alpha=0.5):
"""
Initialize NeuMF with pre-trained GMF and MLP models
Args:
gmf_model: Pre-trained GMF model
mlp_model: Pre-trained MLP model
alpha: Weight for combining GMF and MLP predictions
"""
neumf = NeuMF(n_users, n_items)
# Transfer GMF embeddings
neumf.user_embedding_gmf.weight.data = gmf_model.user_embedding.weight.data
neumf.item_embedding_gmf.weight.data = gmf_model.item_embedding.weight.data
# Transfer MLP embeddings
neumf.user_embedding_mlp.weight.data = mlp_model.user_embedding.weight.data
neumf.item_embedding_mlp.weight.data = mlp_model.item_embedding.weight.data
# Transfer MLP layers
neumf.mlp_layers.load_state_dict(mlp_model.mlp_layers.state_dict())
# Initialize output layer with weighted combination
neumf.output_layer.weight.data = torch.cat([
alpha * gmf_model.output_layer.weight.data,
(1 - alpha) * mlp_model.output_layer.weight.data
], dim=1)
return neumf
2. Regularization Techniques
class RegularizedNCF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32, dropout=0.2):
super(RegularizedNCF, self).__init__()
# Embeddings with weight regularization
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
# Initialize with smaller values
nn.init.normal_(self.user_embedding.weight, std=0.01)
nn.init.normal_(self.item_embedding.weight, std=0.01)
# Layers with dropout
self.fc1 = nn.Linear(embedding_dim * 2, 64)
self.dropout1 = nn.Dropout(dropout)
self.bn1 = nn.BatchNorm1d(64)
self.fc2 = nn.Linear(64, 32)
self.dropout2 = nn.Dropout(dropout)
self.bn2 = nn.BatchNorm1d(32)
self.fc3 = nn.Linear(32, 16)
self.dropout3 = nn.Dropout(dropout)
self.bn3 = nn.BatchNorm1d(16)
self.output = nn.Linear(16, 1)
def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)
x = torch.cat([user_embed, item_embed], dim=-1)
x = self.fc1(x)
x = self.bn1(x)
x = F.relu(x)
x = self.dropout1(x)
x = self.fc2(x)
x = self.bn2(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc3(x)
x = self.bn3(x)
x = F.relu(x)
x = self.dropout3(x)
output = torch.sigmoid(self.output(x))
return output.squeeze()
3. Handling Cold Start Problem
For new users or items with few interactions:
class NCFWithSideInfo(nn.Module):
"""NCF with additional side information (features) for cold start"""
def __init__(self, n_users, n_items, user_features_dim, item_features_dim,
embedding_dim=32):
super(NCFWithSideInfo, self).__init__()
# ID embeddings
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
# Feature projection layers
self.user_feature_proj = nn.Linear(user_features_dim, embedding_dim)
self.item_feature_proj = nn.Linear(item_features_dim, embedding_dim)
# Interaction layers
self.fc1 = nn.Linear(embedding_dim * 2, 64)
self.fc2 = nn.Linear(64, 32)
self.fc3 = nn.Linear(32, 16)
self.output = nn.Linear(16, 1)
def forward(self, user_ids, item_ids, user_features=None, item_features=None):
# Get ID embeddings
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)
# Incorporate features if available
if user_features is not None:
user_feat_embed = self.user_feature_proj(user_features)
user_embed = user_embed + user_feat_embed
if item_features is not None:
item_feat_embed = self.item_feature_proj(item_features)
item_embed = item_embed + item_feat_embed
# Standard NCF forward pass
x = torch.cat([user_embed, item_embed], dim=-1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
output = torch.sigmoid(self.output(x))
return output.squeeze()
Practical Considerations
1. Scalability
For large-scale systems with millions of users and items:
# Use mixed precision training
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for batch in dataloader:
with autocast():
predictions = model(user_ids, item_ids)
loss = criterion(predictions, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
2. Online Learning
Update model with new user interactions:
class OnlineNCF:
def __init__(self, model, learning_rate=0.001):
self.model = model
self.optimizer = optim.Adam(model.parameters(), lr=learning_rate)
self.criterion = nn.BCELoss()
def update(self, user_id, item_id, label):
"""Update model with a single new interaction"""
self.model.train()
user_id = torch.tensor([user_id], dtype=torch.long)
item_id = torch.tensor([item_id], dtype=torch.long)
label = torch.tensor([label], dtype=torch.float)
prediction = self.model(user_id, item_id)
loss = self.criterion(prediction, label)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
return loss.item()
3. A/B Testing Framework
class RecommendationABTest:
def __init__(self, model_a, model_b):
self.model_a = model_a
self.model_b = model_b
self.results = {'A': [], 'B': []}
def get_recommendations(self, user_id, candidate_items, variant):
"""Get recommendations from specified model variant"""
model = self.model_a if variant == 'A' else self.model_b
user_ids = torch.tensor([user_id] * len(candidate_items))
item_ids = torch.tensor(candidate_items)
with torch.no_grad():
scores = model(user_ids, item_ids)
# Return top-K items
_, top_indices = torch.topk(scores, k=10)
return [candidate_items[i] for i in top_indices]
def record_outcome(self, variant, clicked):
"""Record whether user clicked on any recommended item"""
self.results[variant].append(1 if clicked else 0)
def get_statistics(self):
"""Calculate CTR for each variant"""
ctr_a = np.mean(self.results['A'])
ctr_b = np.mean(self.results['B'])
return {
'CTR_A': ctr_a,
'CTR_B': ctr_b,
'Improvement': (ctr_b - ctr_a) / ctr_a * 100
}
Comparison with Other Methods
NCF vs Traditional Collaborative Filtering
| Aspect | Traditional CF | NCF |
|---|---|---|
| Interaction Function | Fixed (dot product) | Learned from data |
| Non-linearity | Limited | High capacity |
| Feature Engineering | Manual | Automatic |
| Training Complexity | Lower | Higher |
| Expressiveness | Limited | High |
| Interpretability | Better | More challenging |
Performance Benchmarks
Based on the original paper's experiments on MovieLens and Pinterest datasets:
# Example results (HR@10)
results = {
'MF': 0.692,
'GMF': 0.708,
'MLP': 0.713,
'NeuMF': 0.726 # Best performance
}
# Relative improvements
improvement_over_mf = (results['NeuMF'] - results['MF']) / results['MF']
print(f"NeuMF improves HR@10 by {improvement_over_mf*100:.2f}% over traditional MF")
Real-World Applications
1. E-commerce Product Recommendations
class ProductRecommender:
def __init__(self, model, item_catalog):
self.model = model
self.item_catalog = item_catalog # Product metadata
def recommend_products(self, user_id, exclude_purchased=True,
category_filter=None, k=10):
"""
Recommend products for a user with business constraints
"""
# Get candidate items
candidate_items = self.get_candidate_items(
user_id,
exclude_purchased=exclude_purchased,
category_filter=category_filter
)
# Get predictions
user_ids = torch.tensor([user_id] * len(candidate_items))
item_ids = torch.tensor(candidate_items)
with torch.no_grad():
scores = self.model(user_ids, item_ids)
# Apply business rules (e.g., diversity, freshness)
recommendations = self.apply_business_rules(
candidate_items, scores, k=k
)
return recommendations
2. Content Streaming Services
class VideoRecommender:
def __init__(self, model):
self.model = model
def recommend_videos(self, user_id, context=None, k=10):
"""
Recommend videos with contextual information
Context can include: time of day, device, location
"""
# Get base recommendations
base_recs = self.get_base_recommendations(user_id, k=k*2)
# Re-rank based on context
if context:
final_recs = self.contextual_rerank(base_recs, context, k=k)
else:
final_recs = base_recs[:k]
return final_recs
def contextual_rerank(self, items, context, k):
"""
Re-rank items based on contextual features
"""
# Example: boost shorter videos in mobile context
if context.get('device') == 'mobile':
items = sorted(items, key=lambda x: x['duration'])
return items[:k]
Advantages and Limitations
Advantages
- Flexible Architecture: Can model complex non-linear interactions
- No Feature Engineering: Automatically learns feature representations
- State-of-the-art Performance: Often outperforms traditional methods
- Extensible: Easy to incorporate additional features and context
Limitations
- Computational Cost: Higher training and inference costs
- Data Hungry: Requires substantial training data
- Black Box Nature: Less interpretable than traditional methods
- Cold Start: Still struggles with completely new users/items without features
- Start Simple: Begin with basic MLP, then try more complex architectures
- Tune Hyperparameters: Embedding dimension, layer sizes, learning rate are crucial
- Use Pre-training: Initialize with GMF/MLP for better convergence
- Monitor Overfitting: Use dropout, batch normalization, and validation sets
- Consider Hybrid Approaches: Combine NCF with content-based features
Future Directions
1. Attention Mechanisms
Incorporating attention to focus on relevant user-item interactions:
class AttentiveNCF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32):
super(AttentiveNCF, self).__init__()
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
# Attention mechanism
self.attention = nn.Sequential(
nn.Linear(embedding_dim * 2, 32),
nn.ReLU(),
nn.Linear(32, 1),
nn.Softmax(dim=1)
)
# Prediction layers
self.fc = nn.Linear(embedding_dim * 2, 1)
def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)
concat = torch.cat([user_embed, item_embed], dim=-1)
# Apply attention
attention_weights = self.attention(concat)
weighted_concat = concat * attention_weights
output = torch.sigmoid(self.fc(weighted_concat))
return output.squeeze()
2. Graph Neural Networks
Leveraging user-item interaction graphs for richer representations.
3. Transformer-based Models
Using self-attention mechanisms to capture sequential patterns in user behavior.
Conclusion
Neural Collaborative Filtering represents a significant advancement in recommendation systems by replacing fixed interaction functions with learnable neural architectures. The framework's flexibility allows it to model complex user-item relationships while maintaining reasonable computational efficiency.
Key takeaways:
- NCF generalizes matrix factorization by learning the interaction function
- Multiple architecture variants (GMF, MLP, NeuMF) offer different trade-offs
- Proper training strategies and regularization are crucial for good performance
- The framework can be extended with additional features and constraints for real-world applications
While NCF has shown impressive results, it's important to consider the specific requirements of your application, including computational constraints, data availability, and the need for model interpretability.