Customer Churn Analysis & Prediction

Заказчик: AI | Опубликовано: 15.04.2026

Customer Churn Analysis & Prediction Project Overview Conducted an end-to-end Customer Churn Analysis project to identify patterns and predict customer attrition. Focused on both data exploration (EDA) and machine learning modeling to generate actionable business insights. Aimed to help businesses reduce churn, improve customer retention, and increase profitability. Dataset Details Dataset includes customer demographic, financial, and behavioral features: RowNumber, CustomerId, Surname CreditScore, Geography, Gender Age, Tenure, Balance NumOfProducts, HasCrCard, IsActiveMember EstimatedSalary Target Variable: Exited (Churn Status) Tools & Technologies Python Libraries: pandas, numpy (data processing) matplotlib, seaborn (data visualization) scikit-learn (ML modeling) Advanced Techniques: SMOTE (handling class imbalance) Feature scaling & encoding Model evaluation metrics Exploratory Data Analysis (EDA) Performed deep EDA to uncover: Customer behavior trends Churn patterns across geography, age, and balance Correlation between features and churn Created visualizations: Heatmaps, distributions, count plots Identified key drivers of churn: Age, inactivity, low engagement, and account balance Machine Learning Models Implemented Logistic Regression Random Forest Classifier K-Nearest Neighbors (KNN) Support Vector Machine (SVM) XGBoost Gradient Boosting Handling Imbalanced Data Applied SMOTE (Synthetic Minority Oversampling Technique) to: Balance churn vs non-churn classes Improve recall and F1 score for minority class Used class weighting for better model fairness Model Performance Summary Evaluated using: Accuracy Recall F1 Score ROC-AUC Score Key Results: Gradient Boosting Best overall performer Highest F1 Score: 0.598 Highest ROC-AUC: 0.859 XGBoost Strong second-best model Balanced precision & recall Random Forest High accuracy but weaker on churn detection SVM & KNN Moderate performance Logistic Regression Least effective for this dataset Key Insights Customer churn is strongly influenced by: Low activity levels Fewer product engagements Demographic factors (age, geography) Models like Gradient Boosting & XGBoost handle imbalance better and provide reliable predictions. Business Impact Helps businesses: Predict high-risk customers Design targeted retention strategies Improve customer lifetime value (CLV) Provides a data-driven foundation for decision-making Deliverables Cleaned and processed dataset EDA report with visual insights Trained ML models Model comparison report Prediction-ready pipeline