Title Build Comprehensive Global Movie & TV Metadata Database for Recommendation Engine Project Overview I am building a personal recommendation engine that predicts my rating for movies and TV shows based on a large history of titles I have already rated. To support accurate predictions, I need a global media metadata database that contains rich structured information for movies and TV series. The dataset should combine multiple trusted sources and be designed for machine-learning comparison against my rating history. Scope of Work Build a master dataset containing global movie and TV metadata. This will serve as the candidate pool for prediction models. The database should include titles from: • IMDb official dataset • The Movie Database API • Optional enrichment from JustWatch Required Data Fields Each title should include as many of the following as possible. Core identification • imdb_id • tmdb_id • title • original_title • year • type (movie / series / episode / miniseries) Basic metadata • genres • runtime • language • country • release date Creative team • director(s) • writer(s) • top cast (first 10 billed) Ratings and popularity • IMDb rating • IMDb vote count • TMDb rating • TMDb vote count Narrative metadata • plot summary • keywords • themes/tags if available Structural attributes • franchise/series linkage • episode relationships for TV • sequel/prequel relationships Production information • production companies • budget • revenue (if available) • streaming availability (JustWatch) Deliverables 1. Merged master dataset • CSV or PostgreSQL database 2. Schema documentation 3. Data cleaning • remove duplicates • normalize titles • unify IDs 4. ETL pipeline • scripts to refresh the dataset monthly 5. Matching keys • imdb_id • tmdb_id Database Size Expectations Movies: 600k + TV Series 200k + Technical Requirements Preferred stack: • Python • Pandas • PostgreSQL or SQLite • API integration • ETL scripting Important Constraints Do NOT scrape IMDb pages. Use official datasets and APIs only. Goal of the Project The database will be used to: • compare against a personal movie rating history • calculate similarity between titles • generate predicted ratings • identify highly compatible unseen movies Accuracy of metadata is critical. Ideal Candidate Experience with: • media datasets • ETL pipelines • Python data engineering • IMDb or TMDb APIs • building large datasets