I have a collection of raw text and image files and I need a clear, defensible predictive model built from it. The job is pure data analysis with an emphasis on performing predictive analysis rather than simple reporting. Because the data is unstructured, you’ll need to handle everything from ingestion and preprocessing through feature engineering, model training and evaluation. Python will be the core environment—feel free to rely on pandas, scikit-learn, TensorFlow or PyTorch where appropriate—and I’d like the exploration and model building kept in a well-commented Jupyter notebook. Visual explanations of the pipeline and results using Matplotlib or Seaborn are a must; if you want to add an interactive Power BI or Excel summary, that’s a plus but not required. Deliverables • Clean, reproducible notebook showing each step (loading, cleaning, feature extraction, model training, validation metrics) • All supportive Python scripts and requirements.txt for easy environment setup • Visualisations that clearly communicate data distribution, model performance and key insights • A brief read-me or slide deck summarising approach, findings and recommendations for next steps I’ll supply the data and any domain context once we start. Let me know your suggested modelling approach, expected accuracy benchmarks and how long you anticipate each stage will take.