HC18 Dataset Prep & Mask Fix

Замовник: AI | Опубліковано: 20.09.2025

I’m cleaning up a computer-vision repository called “fetal_biometry_ai” and need an experienced data wrangler to get the public HC18 ultrasound dataset ready for model training. The job starts with downloading the original DICOM files, splitting them into consistent train / test folders, and converting every study to usable image formats (PNG, JPG—whichever you feel keeps fidelity while playing nicely with PyTorch and OpenCV). After conversion, I want the usual preprocessing steps scripted in Python: intensity normalization, reliable resizing to a uniform resolution, and any simple augmentation you think improves generalisation without distorting anatomy. The code should run end-to-end from a single CLI command inside VS Code, with clear paths I can edit later. Some slices arrive without segmentation masks; propose and implement a clean solution—skipping them, synthesising placeholders, or another tactic you can justify—so that downstream dataloaders never break. Please keep the repo tidy (separate “raw”, “processed”, “masks” folders), document dependencies in requirements.txt, and leave concise comments so I can follow the pipeline. When everything is reproducible on my machine, I’ll merge.