Python Data Cleaning & Predictive Analysis

I have a mixed dataset with both text and numerical features that first needs to be scrubbed and structured in Python, ideally with pandas for the heavy-lifting. After the cleaning stage I want an exploratory dive—summary statistics, correlations, distributions, outlier checks—so we can truly understand what is driving the numbers (and the words) before modeling. The ultimate goal is a predictive analysis that not only trains a reliable model but also tells a compelling story through clear visualisations. Feel free to bring in scikit-learn, seaborn, matplotlib or any other Python libraries that will speed up the workflow, as long as the workflow is reproducible (Jupyter Notebook or .py scripts are fine). Deliverables • Cleaned dataset ready for downstream use • EDA report with visual insights • Predictive model with performance metrics and a short interpretation of the results • All code and instructions so I can rerun everything on my end If this sounds like your typical day in Python, let’s get started.

Python

Реєстрація