Movie Recommendation System

ML & Systems Engineer

Designed and implemented a production-grade recommendation platform for a streaming environment serving ~1M users across a ~20K movie catalog.

The system was built around real-time serving, event-driven telemetry, and operational controls that support continuous model improvement without service disruption. Instead of optimizing for offline scores alone, the architecture prioritizes deployment safety, a 600ms latency SLA, and cost-aware ranking behavior.

It supports recommendation generation at scale, model lifecycle governance, and experiment-driven iteration through integrated monitoring, alerting, and A/B routing workflows.

Python FastAPI Apache Kafka Docker GitHub Actions Model Registry A/B Testing Drift Detection
Movie Recommendation System overview

Problem

A streaming platform needs scalable, reliable recommendation infrastructure that performs under real production constraints, not just high offline model accuracy. The challenge is to serve personalized recommendations at large scale with strict latency expectations while managing cold start, shifting behavior patterns, ranking cost, and safe deployment operations.

Solution

The platform was designed as an end-to-end MLOps system spanning ingestion, feature engineering, model training, deployment, monitoring, and experimentation. Kafka streams capture recommendation requests, watch events, ratings, and account creation events, while metadata APIs enrich user and movie context for ranking pipelines.

Multiple model candidates are trained and compared across quality and operational efficiency, then promoted through controlled release workflows. The inference service exposes a low-latency `/recommend/<userid>` endpoint returning up to 20 ranked movie IDs under a 600ms SLA, with telemetry integrated into monitoring and experiment loops for continuous iteration.

System Architecture

Engineering Decisions

Validation & Iteration Strategy

Model development followed a comparative workflow across at least two recommendation approaches, using leakage-safe offline evaluation to filter candidates before deployment. Online telemetry from recommendation requests, watch behavior, and cost signals was then used to guide ranking adjustments and retraining priorities.

A/B testing and traffic splitting were used to evaluate model changes, cold-start policies, and cost-aware ranking tradeoffs under production conditions. Iteration decisions were driven primarily by production feedback because live user behavior and system constraints provide the most reliable signal for long-term platform quality.

My Role

View Code Back to Projects GitHub Demo