Heart Disease Prediction Pipeline

Замовник: AI | Опубліковано: 03.12.2025
Бюджет: 250 $

I have a clean CSV file with patient health records and now need an end-to-end machine-learning pipeline that can turn those rows into reliable, real-time heart-disease risk scores. Here is the flow I have in mind. We start with thorough preprocessing—missing-value handling, outlier checks, and sensible encoding—followed by PCA to gauge dimensionality and a feature-selection step so we keep only clinically relevant signals. From there I want three supervised models trained side by side: Logistic Regression for interpretability, Random Forest for strong baseline accuracy, and Support Vector Machine to probe nonlinear decision boundaries. Feel free to add an unsupervised technique (e.g., K-means clustering) if it helps surface hidden patient segments. Hyperparameter tuning should be systematic (GridSearchCV or Optuna is fine). Once the best model set is locked in, export the artifacts with versioned pickle or joblib files and document the environment so I can reproduce results later. The finished models must power a Streamlit dashboard where a user can paste or upload new patient data, adjust individual input fields, and instantly see the predicted probability of heart disease alongside confidence scores. Clean, intuitive input widgets and concise on-screen feedback are crucial. If you can wrap the app in Ngrok for quick sharing, great—otherwise local hosting is sufficient. Deliverables: • Well-commented Python notebook or .py scripts covering preprocessing, PCA, feature selection, model training, and evaluation • Saved model files plus requirements.txt or environment.yml • Streamlit application folder ready to run with `streamlit run app.py` • Brief readme that walks me through setup, retraining, and optional Ngrok exposure Code should lean on pandas, scikit-learn, matplotlib/seaborn, and Streamlit; please avoid heavyweight dependencies unless absolutely necessary. I’ll test with an unseen subset of the CSV to verify the pipeline before sign-off.