Extract Photo & Signature from Scanned PDF Files (Python / Image Processing)

I need a reliable script/system that can extract photo and signature from scanned PDF files. These PDFs are fully scanned documents (no text layer) where photo and signature are part of the image. Key Requirements (Read Carefully): PDFs are scanned (image-based), not digital Layout is mostly the same, but photo/signature boxes may shift slightly Photo & signature are usually on page 1 If box detection fails, fallback to face detection for photo is required Signature extraction should work using image processing (ink/contour based) Output files must use same base filename as the PDF Example: ABC123.pdf ABC123_photo.jpg ABC123_sign.png Technical Expectations: Language: Python (preferred) Libraries: OpenCV, PyMuPDF / pdf2image, NumPy (or equivalent) DPI handling (300–400 DPI) Deskew / preprocessing for scanned PDFs Batch processing (folder-based) Clean, reusable, well-structured code Nice to Have (Bonus): Config-driven ROI (JSON instead of hardcoded values) Logging (success / failure) Debug images for failed cases Ability to handle multiple pages if page 1 fails What I Will Provide: Sample scanned PDFs Clear expected output examples Feedback during development Who Should Apply: You have real experience with image processing You understand scanned documents (not OCR-only solutions) You can deliver working logic, not just demo scripts Do NOT apply if: You only work with text-based PDFs You rely only on OCR You have no OpenCV/image-processing experience Budget: Open to reasonable bids based on solution quality.

Python

Реєстрація