Python Developer for PDF Book → Database & Auto-Update Tool (ETL) System

Description: We need a Python developer to build a system that ingests manufacturer PDF price books, extracts structured pricing data, and stores it in a normalized database. The tool must detect updates when new editions are uploaded, highlight changes, and provide a lightweight admin UI for upload, preview, comparison, and export. Phase 1 – Price Book ETL Tool - Parse both digital and scanned PDFs (OCR fallback). - Extract items, finishes, options, dimensions, pricing, and effective dates. - Normalize outputs (Excel/CSV + SQL or Baserow). - Auto-diff engine: update prices, insert/remove SKUs, log changes. - Lightweight UI for upload, preview, compare, and export. - Accuracy targets: ≥98% rows, ≥99% numeric values. Phase 2 – Configuration & Pricing Tool - Build configurable product sets (assemblies/kits) using parsed data. - Apply rules: requires/excludes, weight limits, finish mappings, net add/percent add. - Pricing engine: base + adders + discounts with full audit trail. - Auto-reprice saved sets when new price books are uploaded. - Exports: Quote-ready Excel/PDF, CSV for ERP, and REST API endpoints. - Admin UI: drag/drop set builder, validation engine, approval workflow, versioning. Technical Requirements - Python 3.11+, pdfplumber/camelot/pdfminer.six, OCR fallback (Tesseract/LayoutParser). - Postgres/MySQL (or Baserow API). - Fuzzy matching (Levenshtein/TF-IDF). - Lightweight web UI (Flask/Django/FastAPI; React optional). What We Provide - Two real-world price books (Hager & SELECT Hinges, 2025 editions). - Vibe coded project as reference

Python

Реєстрація