Boost ICU Stay Length Prediction

I am competing in the Kaggle “ICU Length of Stay” challenge and need a fresh set of eyes to push my RMSE down from 4.16 to the 3.5–4.0 zone. The data come from MIMIC-III and include a mix of numeric vitals, and categorical chart events. Certain leakage-prone columns (DOD, DISCHTIME, DEATHTIME, HOSPITAL_EXPIRE_FLAG) must stay out of every fold and final model. My first priority is model tuning. I have baseline LightGBM and XGBoost notebooks but they are plateauing, so I’m open to well-tuned tree ensembles, stacked regressors, KNN, SVM, or a lightweight neural network—whatever moves the needle. While you are free to add clever feature work, so everything plays nicely with the chosen model. Imputing the mixed missingness in a principled way is essential, but feel free to lean on tried-and-true methods unless something more sophisticated clearly pays off. Deliverable • A single, clean Jupyter Notebook (.ipynb) that: – loads the public MIMIC-III competition files, performs the one-hot encoding and any additional transformations you deem useful, – handles missing values properly – walks through your tuning strategy step by step (cross-validation, early stopping, hyper-parameter search, ensembling logic, etc.), – reports reproducible CV results plus a Kaggle test submission file in CSV format, – includes concise markdown cells explaining the reasoning so I can justify the approach in a write-up. I value transparency, so please comment source code clearly and cite any external libraries or snippets you borrow. Experience with regression tasks or clinical/health-care datasets will stand out—let me know what you’ve tackled before and how it went. Looking forward to seeing how far you can push the score.

Python

Регистрация