Skip to main content

Neural Collaborative Filtering - DL for Recommendation Systems

· 14 min read
Duong Nguyen Thuan
AI/ML Engineer, MLOps Enthusiast

Neural Collaborative Filtering (NCF) represents a paradigm shift in recommendation systems by replacing traditional matrix factorization techniques with deep neural networks. This approach, introduced in the paper "Neural Collaborative Filtering" (He et al., 2017), leverages the power of deep learning to model complex user-item interactions more effectively.

Introduction to Recommendation Systems

Recommendation systems are everywhere in modern applications - from Netflix suggesting movies to Amazon recommending products. These systems aim to predict user preferences and suggest items that users are likely to interact with.

Traditional recommendation approaches include:

  • Content-Based Filtering: Recommends items similar to those a user has liked before
  • Collaborative Filtering: Leverages patterns from multiple users to make recommendations
  • Hybrid Methods: Combine multiple approaches for better results
Key Insight

Collaborative filtering is based on the idea that users who agreed in the past tend to agree in the future. If User A and User B both liked items 1, 2, and 3, they will likely agree on item 4.

Traditional Matrix Factorization

Before deep learning, Matrix Factorization (MF) was the dominant approach for collaborative filtering. The idea is simple yet powerful:

The Core Concept

Represent users and items as latent feature vectors in a shared embedding space. The interaction between user u and item i is modeled as the inner product of their latent vectors:

ŷᵤᵢ = pᵤᵀ · qᵢ

Where:

  • pᵤ is the latent vector for user u
  • qᵢ is the latent vector for item i
  • ŷᵤᵢ is the predicted interaction score

Limitations of Matrix Factorization

  1. Linear Assumption: The inner product is a linear operation, limiting its ability to capture complex user-item interactions
  2. Fixed Interaction Function: Cannot learn more sophisticated interaction patterns
  3. Limited Expressiveness: May not adequately represent the complexity of user preferences
# Traditional Matrix Factorization example
import numpy as np

class MatrixFactorization:
def __init__(self, n_users, n_items, n_factors=20):
self.user_factors = np.random.normal(0, 0.1, (n_users, n_factors))
self.item_factors = np.random.normal(0, 0.1, (n_items, n_factors))

def predict(self, user_id, item_id):
# Simple dot product - linear interaction
return np.dot(self.user_factors[user_id], self.item_factors[item_id])

Neural Collaborative Filtering (NCF)

NCF addresses the limitations of matrix factorization by using neural networks to learn the interaction function from data, rather than assuming a fixed inner product.

The NCF Framework

The NCF framework consists of three main components:

  1. Embedding Layer: Maps sparse user and item IDs to dense vectors
  2. Neural CF Layers: Learn the interaction function between user and item embeddings
  3. Output Layer: Predicts the interaction score
import torch
import torch.nn as nn

class NCF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32, hidden_layers=[64, 32, 16]):
super(NCF, self).__init__()

# Embedding layers
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)

# Neural CF layers
layers = []
input_dim = embedding_dim * 2
for hidden_dim in hidden_layers:
layers.append(nn.Linear(input_dim, hidden_dim))
layers.append(nn.ReLU())
layers.append(nn.Dropout(0.2))
input_dim = hidden_dim

self.fc_layers = nn.Sequential(*layers)

# Output layer
self.output_layer = nn.Linear(hidden_layers[-1], 1)
self.sigmoid = nn.Sigmoid()

def forward(self, user_ids, item_ids):
# Get embeddings
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)

# Concatenate embeddings
x = torch.cat([user_embed, item_embed], dim=-1)

# Pass through neural layers
x = self.fc_layers(x)

# Get prediction
output = self.sigmoid(self.output_layer(x))

return output.squeeze()

Architecture Variants

The original NCF paper proposed three specific models:

1. Generalized Matrix Factorization (GMF)

GMF generalizes traditional matrix factorization by allowing element-wise product with learnable weights:

class GMF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32):
super(GMF, self).__init__()

self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
self.output_layer = nn.Linear(embedding_dim, 1)
self.sigmoid = nn.Sigmoid()

def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)

# Element-wise product
interaction = user_embed * item_embed

output = self.sigmoid(self.output_layer(interaction))
return output.squeeze()

2. Multi-Layer Perceptron (MLP)

MLP uses multiple neural layers to learn the interaction function:

class MLP(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32, layers=[64, 32, 16, 8]):
super(MLP, self).__init__()

self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)

mlp_modules = []
input_size = embedding_dim * 2
for layer_size in layers:
mlp_modules.append(nn.Linear(input_size, layer_size))
mlp_modules.append(nn.ReLU())
mlp_modules.append(nn.BatchNorm1d(layer_size))
mlp_modules.append(nn.Dropout(0.2))
input_size = layer_size

self.mlp_layers = nn.Sequential(*mlp_modules)
self.output_layer = nn.Linear(layers[-1], 1)
self.sigmoid = nn.Sigmoid()

def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)

# Concatenate embeddings
x = torch.cat([user_embed, item_embed], dim=-1)
x = self.mlp_layers(x)

output = self.sigmoid(self.output_layer(x))
return output.squeeze()

3. Neural Matrix Factorization (NeuMF)

NeuMF combines GMF and MLP to leverage both linear and non-linear interactions:

class NeuMF(nn.Module):
def __init__(self, n_users, n_items, mf_dim=8, mlp_dim=32, layers=[64, 32, 16, 8]):
super(NeuMF, self).__init__()

# GMF embeddings
self.user_embedding_gmf = nn.Embedding(n_users, mf_dim)
self.item_embedding_gmf = nn.Embedding(n_items, mf_dim)

# MLP embeddings
self.user_embedding_mlp = nn.Embedding(n_users, mlp_dim)
self.item_embedding_mlp = nn.Embedding(n_items, mlp_dim)

# MLP layers
mlp_modules = []
input_size = mlp_dim * 2
for layer_size in layers:
mlp_modules.append(nn.Linear(input_size, layer_size))
mlp_modules.append(nn.ReLU())
mlp_modules.append(nn.BatchNorm1d(layer_size))
mlp_modules.append(nn.Dropout(0.2))
input_size = layer_size

self.mlp_layers = nn.Sequential(*mlp_modules)

# Final prediction layer
self.output_layer = nn.Linear(mf_dim + layers[-1], 1)
self.sigmoid = nn.Sigmoid()

def forward(self, user_ids, item_ids):
# GMF part
user_embed_gmf = self.user_embedding_gmf(user_ids)
item_embed_gmf = self.item_embedding_gmf(item_ids)
gmf_output = user_embed_gmf * item_embed_gmf

# MLP part
user_embed_mlp = self.user_embedding_mlp(user_ids)
item_embed_mlp = self.item_embedding_mlp(item_ids)
mlp_input = torch.cat([user_embed_mlp, item_embed_mlp], dim=-1)
mlp_output = self.mlp_layers(mlp_input)

# Concatenate GMF and MLP outputs
concat = torch.cat([gmf_output, mlp_output], dim=-1)
output = self.sigmoid(self.output_layer(concat))

return output.squeeze()

Training Neural Collaborative Filtering

Data Preparation

NCF typically works with implicit feedback (clicks, views, purchases) rather than explicit ratings:

import pandas as pd
from torch.utils.data import Dataset, DataLoader

class ImplicitFeedbackDataset(Dataset):
def __init__(self, interactions, n_users, n_items, negative_samples=4):
"""
Args:
interactions: DataFrame with columns [user_id, item_id]
n_users: Total number of users
n_items: Total number of items
negative_samples: Number of negative samples per positive sample
"""
self.interactions = interactions
self.n_users = n_users
self.n_items = n_items
self.negative_samples = negative_samples

# Create user-item interaction matrix for negative sampling
self.user_items = interactions.groupby('user_id')['item_id'].apply(set).to_dict()

def __len__(self):
return len(self.interactions) * (1 + self.negative_samples)

def __getitem__(self, idx):
# Determine if this is a positive or negative sample
pos_idx = idx // (1 + self.negative_samples)
is_positive = (idx % (1 + self.negative_samples)) == 0

user_id = self.interactions.iloc[pos_idx]['user_id']

if is_positive:
item_id = self.interactions.iloc[pos_idx]['item_id']
label = 1.0
else:
# Sample negative item (item user hasn't interacted with)
while True:
item_id = np.random.randint(0, self.n_items)
if item_id not in self.user_items.get(user_id, set()):
break
label = 0.0

return {
'user_id': torch.tensor(user_id, dtype=torch.long),
'item_id': torch.tensor(item_id, dtype=torch.long),
'label': torch.tensor(label, dtype=torch.float)
}

Loss Function

For implicit feedback, binary cross-entropy loss is commonly used:

def train_epoch(model, dataloader, optimizer, device):
model.train()
total_loss = 0
criterion = nn.BCELoss()

for batch in dataloader:
user_ids = batch['user_id'].to(device)
item_ids = batch['item_id'].to(device)
labels = batch['label'].to(device)

# Forward pass
predictions = model(user_ids, item_ids)
loss = criterion(predictions, labels)

# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()

total_loss += loss.item()

return total_loss / len(dataloader)

Evaluation Metrics

Common metrics for recommendation systems include:

  • Hit Rate (HR@K): Percentage of test cases where the true item appears in top-K recommendations
  • Normalized Discounted Cumulative Gain (NDCG@K): Measures ranking quality with position-based discounting
def evaluate_model(model, test_data, k=10):
"""
Evaluate model using Hit Rate and NDCG
"""
model.eval()
hits = []
ndcgs = []

with torch.no_grad():
for user_id, true_item, candidates in test_data:
# Get predictions for all candidate items
user_ids = torch.tensor([user_id] * len(candidates))
item_ids = torch.tensor(candidates)

predictions = model(user_ids, item_ids)

# Get top-K recommendations
_, top_k_indices = torch.topk(predictions, k)
recommended_items = [candidates[i] for i in top_k_indices]

# Calculate Hit Rate
hit = 1.0 if true_item in recommended_items else 0.0
hits.append(hit)

# Calculate NDCG
if true_item in recommended_items:
rank = recommended_items.index(true_item) + 1
ndcg = 1.0 / np.log2(rank + 1)
else:
ndcg = 0.0
ndcgs.append(ndcg)

return {
'HR@{}'.format(k): np.mean(hits),
'NDCG@{}'.format(k): np.mean(ndcgs)
}

Complete Training Pipeline

Here's a complete example of training an NCF model:

import torch
import torch.optim as optim
from torch.utils.data import DataLoader

# Hyperparameters
n_users = 1000
n_items = 2000
embedding_dim = 32
hidden_layers = [64, 32, 16, 8]
learning_rate = 0.001
batch_size = 256
n_epochs = 20

# Initialize model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = NeuMF(n_users, n_items, mf_dim=8, mlp_dim=32, layers=hidden_layers).to(device)

# Optimizer
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)

# Learning rate scheduler
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=3, verbose=True
)

# Training loop
best_hr = 0
for epoch in range(n_epochs):
# Train
train_loss = train_epoch(model, train_loader, optimizer, device)

# Evaluate
metrics = evaluate_model(model, test_data, k=10)

# Update learning rate
scheduler.step(train_loss)

# Print progress
print(f"Epoch {epoch+1}/{n_epochs}")
print(f" Train Loss: {train_loss:.4f}")
print(f" HR@10: {metrics['HR@10']:.4f}")
print(f" NDCG@10: {metrics['NDCG@10']:.4f}")

# Save best model
if metrics['HR@10'] > best_hr:
best_hr = metrics['HR@10']
torch.save(model.state_dict(), 'best_ncf_model.pth')
print(f" New best model saved!")

Advanced Techniques

1. Pre-training Strategy

The original NCF paper suggests pre-training GMF and MLP separately, then using their weights to initialize NeuMF:

def pretrain_neumf(gmf_model, mlp_model, alpha=0.5):
"""
Initialize NeuMF with pre-trained GMF and MLP models

Args:
gmf_model: Pre-trained GMF model
mlp_model: Pre-trained MLP model
alpha: Weight for combining GMF and MLP predictions
"""
neumf = NeuMF(n_users, n_items)

# Transfer GMF embeddings
neumf.user_embedding_gmf.weight.data = gmf_model.user_embedding.weight.data
neumf.item_embedding_gmf.weight.data = gmf_model.item_embedding.weight.data

# Transfer MLP embeddings
neumf.user_embedding_mlp.weight.data = mlp_model.user_embedding.weight.data
neumf.item_embedding_mlp.weight.data = mlp_model.item_embedding.weight.data

# Transfer MLP layers
neumf.mlp_layers.load_state_dict(mlp_model.mlp_layers.state_dict())

# Initialize output layer with weighted combination
neumf.output_layer.weight.data = torch.cat([
alpha * gmf_model.output_layer.weight.data,
(1 - alpha) * mlp_model.output_layer.weight.data
], dim=1)

return neumf

2. Regularization Techniques

class RegularizedNCF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32, dropout=0.2):
super(RegularizedNCF, self).__init__()

# Embeddings with weight regularization
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)

# Initialize with smaller values
nn.init.normal_(self.user_embedding.weight, std=0.01)
nn.init.normal_(self.item_embedding.weight, std=0.01)

# Layers with dropout
self.fc1 = nn.Linear(embedding_dim * 2, 64)
self.dropout1 = nn.Dropout(dropout)
self.bn1 = nn.BatchNorm1d(64)

self.fc2 = nn.Linear(64, 32)
self.dropout2 = nn.Dropout(dropout)
self.bn2 = nn.BatchNorm1d(32)

self.fc3 = nn.Linear(32, 16)
self.dropout3 = nn.Dropout(dropout)
self.bn3 = nn.BatchNorm1d(16)

self.output = nn.Linear(16, 1)

def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)

x = torch.cat([user_embed, item_embed], dim=-1)

x = self.fc1(x)
x = self.bn1(x)
x = F.relu(x)
x = self.dropout1(x)

x = self.fc2(x)
x = self.bn2(x)
x = F.relu(x)
x = self.dropout2(x)

x = self.fc3(x)
x = self.bn3(x)
x = F.relu(x)
x = self.dropout3(x)

output = torch.sigmoid(self.output(x))
return output.squeeze()

3. Handling Cold Start Problem

For new users or items with few interactions:

class NCFWithSideInfo(nn.Module):
"""NCF with additional side information (features) for cold start"""

def __init__(self, n_users, n_items, user_features_dim, item_features_dim,
embedding_dim=32):
super(NCFWithSideInfo, self).__init__()

# ID embeddings
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)

# Feature projection layers
self.user_feature_proj = nn.Linear(user_features_dim, embedding_dim)
self.item_feature_proj = nn.Linear(item_features_dim, embedding_dim)

# Interaction layers
self.fc1 = nn.Linear(embedding_dim * 2, 64)
self.fc2 = nn.Linear(64, 32)
self.fc3 = nn.Linear(32, 16)
self.output = nn.Linear(16, 1)

def forward(self, user_ids, item_ids, user_features=None, item_features=None):
# Get ID embeddings
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)

# Incorporate features if available
if user_features is not None:
user_feat_embed = self.user_feature_proj(user_features)
user_embed = user_embed + user_feat_embed

if item_features is not None:
item_feat_embed = self.item_feature_proj(item_features)
item_embed = item_embed + item_feat_embed

# Standard NCF forward pass
x = torch.cat([user_embed, item_embed], dim=-1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
output = torch.sigmoid(self.output(x))

return output.squeeze()

Practical Considerations

1. Scalability

For large-scale systems with millions of users and items:

# Use mixed precision training
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
with autocast():
predictions = model(user_ids, item_ids)
loss = criterion(predictions, labels)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

2. Online Learning

Update model with new user interactions:

class OnlineNCF:
def __init__(self, model, learning_rate=0.001):
self.model = model
self.optimizer = optim.Adam(model.parameters(), lr=learning_rate)
self.criterion = nn.BCELoss()

def update(self, user_id, item_id, label):
"""Update model with a single new interaction"""
self.model.train()

user_id = torch.tensor([user_id], dtype=torch.long)
item_id = torch.tensor([item_id], dtype=torch.long)
label = torch.tensor([label], dtype=torch.float)

prediction = self.model(user_id, item_id)
loss = self.criterion(prediction, label)

self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()

return loss.item()

3. A/B Testing Framework

class RecommendationABTest:
def __init__(self, model_a, model_b):
self.model_a = model_a
self.model_b = model_b
self.results = {'A': [], 'B': []}

def get_recommendations(self, user_id, candidate_items, variant):
"""Get recommendations from specified model variant"""
model = self.model_a if variant == 'A' else self.model_b

user_ids = torch.tensor([user_id] * len(candidate_items))
item_ids = torch.tensor(candidate_items)

with torch.no_grad():
scores = model(user_ids, item_ids)

# Return top-K items
_, top_indices = torch.topk(scores, k=10)
return [candidate_items[i] for i in top_indices]

def record_outcome(self, variant, clicked):
"""Record whether user clicked on any recommended item"""
self.results[variant].append(1 if clicked else 0)

def get_statistics(self):
"""Calculate CTR for each variant"""
ctr_a = np.mean(self.results['A'])
ctr_b = np.mean(self.results['B'])

return {
'CTR_A': ctr_a,
'CTR_B': ctr_b,
'Improvement': (ctr_b - ctr_a) / ctr_a * 100
}

Comparison with Other Methods

NCF vs Traditional Collaborative Filtering

AspectTraditional CFNCF
Interaction FunctionFixed (dot product)Learned from data
Non-linearityLimitedHigh capacity
Feature EngineeringManualAutomatic
Training ComplexityLowerHigher
ExpressivenessLimitedHigh
InterpretabilityBetterMore challenging

Performance Benchmarks

Based on the original paper's experiments on MovieLens and Pinterest datasets:

# Example results (HR@10)
results = {
'MF': 0.692,
'GMF': 0.708,
'MLP': 0.713,
'NeuMF': 0.726 # Best performance
}

# Relative improvements
improvement_over_mf = (results['NeuMF'] - results['MF']) / results['MF']
print(f"NeuMF improves HR@10 by {improvement_over_mf*100:.2f}% over traditional MF")

Real-World Applications

1. E-commerce Product Recommendations

class ProductRecommender:
def __init__(self, model, item_catalog):
self.model = model
self.item_catalog = item_catalog # Product metadata

def recommend_products(self, user_id, exclude_purchased=True,
category_filter=None, k=10):
"""
Recommend products for a user with business constraints
"""
# Get candidate items
candidate_items = self.get_candidate_items(
user_id,
exclude_purchased=exclude_purchased,
category_filter=category_filter
)

# Get predictions
user_ids = torch.tensor([user_id] * len(candidate_items))
item_ids = torch.tensor(candidate_items)

with torch.no_grad():
scores = self.model(user_ids, item_ids)

# Apply business rules (e.g., diversity, freshness)
recommendations = self.apply_business_rules(
candidate_items, scores, k=k
)

return recommendations

2. Content Streaming Services

class VideoRecommender:
def __init__(self, model):
self.model = model

def recommend_videos(self, user_id, context=None, k=10):
"""
Recommend videos with contextual information
Context can include: time of day, device, location
"""
# Get base recommendations
base_recs = self.get_base_recommendations(user_id, k=k*2)

# Re-rank based on context
if context:
final_recs = self.contextual_rerank(base_recs, context, k=k)
else:
final_recs = base_recs[:k]

return final_recs

def contextual_rerank(self, items, context, k):
"""
Re-rank items based on contextual features
"""
# Example: boost shorter videos in mobile context
if context.get('device') == 'mobile':
items = sorted(items, key=lambda x: x['duration'])

return items[:k]

Advantages and Limitations

Advantages

  1. Flexible Architecture: Can model complex non-linear interactions
  2. No Feature Engineering: Automatically learns feature representations
  3. State-of-the-art Performance: Often outperforms traditional methods
  4. Extensible: Easy to incorporate additional features and context

Limitations

  1. Computational Cost: Higher training and inference costs
  2. Data Hungry: Requires substantial training data
  3. Black Box Nature: Less interpretable than traditional methods
  4. Cold Start: Still struggles with completely new users/items without features
Best Practices
  1. Start Simple: Begin with basic MLP, then try more complex architectures
  2. Tune Hyperparameters: Embedding dimension, layer sizes, learning rate are crucial
  3. Use Pre-training: Initialize with GMF/MLP for better convergence
  4. Monitor Overfitting: Use dropout, batch normalization, and validation sets
  5. Consider Hybrid Approaches: Combine NCF with content-based features

Future Directions

1. Attention Mechanisms

Incorporating attention to focus on relevant user-item interactions:

class AttentiveNCF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=32):
super(AttentiveNCF, self).__init__()

self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)

# Attention mechanism
self.attention = nn.Sequential(
nn.Linear(embedding_dim * 2, 32),
nn.ReLU(),
nn.Linear(32, 1),
nn.Softmax(dim=1)
)

# Prediction layers
self.fc = nn.Linear(embedding_dim * 2, 1)

def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)

concat = torch.cat([user_embed, item_embed], dim=-1)

# Apply attention
attention_weights = self.attention(concat)
weighted_concat = concat * attention_weights

output = torch.sigmoid(self.fc(weighted_concat))
return output.squeeze()

2. Graph Neural Networks

Leveraging user-item interaction graphs for richer representations.

3. Transformer-based Models

Using self-attention mechanisms to capture sequential patterns in user behavior.

Conclusion

Neural Collaborative Filtering represents a significant advancement in recommendation systems by replacing fixed interaction functions with learnable neural architectures. The framework's flexibility allows it to model complex user-item relationships while maintaining reasonable computational efficiency.

Key takeaways:

  • NCF generalizes matrix factorization by learning the interaction function
  • Multiple architecture variants (GMF, MLP, NeuMF) offer different trade-offs
  • Proper training strategies and regularization are crucial for good performance
  • The framework can be extended with additional features and constraints for real-world applications

While NCF has shown impressive results, it's important to consider the specific requirements of your application, including computational constraints, data availability, and the need for model interpretability.

Additional Resources