Python EDA for CSV Dataset

I have a single CSV file, roughly 1–5 GB in size, that mixes numerical and categorical columns. I want the data loaded into Python, cleaned, explored, and visualised so I can understand its main patterns and issues. Your workflow should revolve around Pandas for wrangling, NumPy for any numerical operations, and Matplotlib (Seaborn is fine too) for the charts. Please: • read the file efficiently, • fix or clearly flag missing values, inconsistent types, and obvious outliers, • run a concise exploratory data analysis covering distributions, basic correlations, and any other quick-win checks you judge useful, • produce a handful of easy-to-read plots (think histograms, bar charts, scatter plots, simple heatmaps), and • wrap everything in a well-commented Jupyter notebook so I can follow each step. Alongside the notebook, include the cleaned dataset (CSV or Parquet) and a short paragraph-style summary of the key insights in plain English. Code must run end-to-end in a standard Python 3 environment using only open-source libraries.

Python

Реєстрація