Movie Recommendation System

ML & Systems Engineer

Designed and implemented a production-grade recommendation platform for a streaming environment serving ~1M users across a ~20K movie catalog.

The system was built around real-time serving, event-driven telemetry, and operational controls that support continuous model improvement without service disruption. Instead of optimizing for offline scores alone, the architecture prioritizes deployment safety, a 600ms latency SLA, and cost-aware ranking behavior.

It supports recommendation generation at scale, model lifecycle governance, and experiment-driven iteration through integrated monitoring, alerting, and A/B routing workflows.

Python FastAPI Apache Kafka Docker GitHub Actions Model Registry A/B Testing Drift Detection

Problem

A streaming platform needs scalable, reliable recommendation infrastructure that performs under real production constraints, not just high offline model accuracy. The challenge is to serve personalized recommendations at large scale with strict latency expectations while managing cold start, shifting behavior patterns, ranking cost, and safe deployment operations.

Cold-start users require useful ranking before meaningful interaction history exists.
Ranking decisions must balance predicted relevance and license-cost awareness.
Model quality must be evaluated online under drift and deployment constraints, not in isolation.

Solution

The platform was designed as an end-to-end MLOps system spanning ingestion, feature engineering, model training, deployment, monitoring, and experimentation. Kafka streams capture recommendation requests, watch events, ratings, and account creation events, while metadata APIs enrich user and movie context for ranking pipelines.

Multiple model candidates are trained and compared across quality and operational efficiency, then promoted through controlled release workflows. The inference service exposes a low-latency `/recommend/<userid>` endpoint returning up to 20 ranked movie IDs under a 600ms SLA, with telemetry integrated into monitoring and experiment loops for continuous iteration.

System Architecture

Kafka ingestion: Event streams for recommendation requests, watch behavior, ratings, and account events feed training and monitoring systems.
Metadata APIs: User and movie metadata enrichment services provide additional context for feature construction and ranking constraints.
Training pipeline: Leakage-safe dataset preparation, feature generation, and multi-model training are orchestrated in modular jobs.
Model registry: Candidate models are versioned and promoted with provenance metadata and rollback support.
Inference service: Containerized API service serves ranked results in real time with request-level logging.
Monitoring stack: Availability, drift signals, online accuracy proxies, and recommendation cost are tracked with alerting policies.
CI/CD: Automated test, build, and deployment workflows enforce reliability and release consistency.
Experiment routing: A/B traffic splitting controls rollout of new ranking strategies and cold-start policies.

Engineering Decisions

Selected model candidates using latency-aware and cost-aware tradeoffs instead of ranking quality alone.
Integrated cost-aware ranking logic so recommendation relevance and licensing constraints could be optimized jointly.
Separated offline evaluation from online validation to avoid over-trusting offline metrics for deployment decisions.
Handled cold start through user self-descriptions to bootstrap recommendations before sufficient interaction history.
Containerized inference services to standardize runtime environments and enable safer model switching.
Designed monitoring around operationally actionable signals, including drift, availability, and ranking-cost behavior.

Validation & Iteration Strategy

Model development followed a comparative workflow across at least two recommendation approaches, using leakage-safe offline evaluation to filter candidates before deployment. Online telemetry from recommendation requests, watch behavior, and cost signals was then used to guide ranking adjustments and retraining priorities.

A/B testing and traffic splitting were used to evaluate model changes, cold-start policies, and cost-aware ranking tradeoffs under production conditions. Iteration decisions were driven primarily by production feedback because live user behavior and system constraints provide the most reliable signal for long-term platform quality.

My Role

Owned end-to-end system design from ingestion architecture through online serving and monitoring.
Built and compared multiple recommendation models with deployment-focused tradeoff analysis.
Implemented data infrastructure for event streaming, enrichment, schema validation, and drift monitoring.
Set up CI/CD pipelines, containerized inference, and zero-downtime model release patterns.
Implemented monitoring, alerting, and provenance tracking across model, pipeline, and dataset versions.
Designed and operated A/B experimentation workflows for production ranking iteration.
Led fairness, safety, security, and feedback-loop risk analysis for deployment readiness.

View Code Back to Projects GitHub Demo