Employee Attrition Prediction Model

Бюджет: 25 $

I have a dataset containing demographic details, job performance records and results from employee-engagement surveys. Your task is to turn this raw information into a reliable attrition-prediction pipeline. Work starts with careful cleaning and preprocessing: handle missing values, encode categorical variables, standardise or normalise where needed and document every step so the workflow is fully reproducible. A brief exploratory analysis should follow to highlight key attrition drivers and verify data quality before modelling. For the classifier, I’d like you to focus on K-Nearest Neighbours. If you find that another algorithm beats KNN convincingly, feel free to present the comparison—but please include KNN in the final report. Train, tune and validate the model, then evaluate it with accuracy, precision, recall, F1 and ROC-AUC. I expect a concise explanation of hyper-parameter choices and cross-validation results. Visual insights are important to the management team, so be sure to include: • Attrition rates by department • Attrition rates by tenure bands • A feature-importance view (even with KNN you can approximate this through permutation or SHAP) Deliverables: • The cleaned, well-documented dataset (CSV or Parquet) • A self-contained Python notebook or script that runs end-to-end • All generated visualisations in PNG or embedded in the notebook • A short report summarising methods, results and next-step recommendations The code should run on standard Python 3 with common libraries such as pandas, scikit-learn, matplotlib/seaborn (or Plotly if you prefer interactive charts). Once finished, I’ll verify that: 1. The notebook executes without errors on my machine. 2. Reported metrics match those produced by the code. 3. Visualisations clearly reflect the three requested views. If that sounds clear, let’s get started.

Python

Реєстрація