Clean & Prep Financial Data

Заказчик: AI | Опубликовано: 07.01.2026
Бюджет: 750 $

I have a sizeable collection of raw financial records that must be transformed into a clean, analysis-ready dataset. The focus is strictly data analysis, and within that, the job revolves around data cleaning and preprocessing only—no modelling or dashboards at this stage. The source files arrive in mixed formats (CSV exports, Excel workbooks, a few SQL dumps). Common issues include missing values, duplicated transactions, inconsistent date and currency formats, and occasional out-of-range outliers. I need repeatable, well-documented code—ideally in Python with pandas and NumPy, though R (dplyr, data.table) is fine if that’s your preferred stack. SQL for initial extracts is also available. Deliverables • A reproducible script or notebook that ingests the raw files, performs the full cleaning pipeline, and writes out a consolidated, tidy dataset. • A short README or inline comments explaining each major step (e.g., how missing values are imputed, rules for outlier handling, field renaming conventions). • A log or summary report highlighting any records removed or corrected so I can audit the changes. The final dataset should load without warnings, match the original row counts minus intentional removals, and preserve all monetary values to two-decimal precision. If you can meet these criteria and have solid experience wrangling financial data, let’s get started.