Punjabi Voter List OCR Adaptation

Замовник: AI | Опубліковано: 01.02.2026

I already have a rock-solid script that converts Tamil and English voter-list PDFs (scanned images) straight into neatly structured Excel sheets with perfect accuracy. Now I need that very same reliability extended to Punjabi. The requirement is tough but clear: deliver an OCR module that handles scanned electoral rolls in Punjabi and pushes the data into Excel with better than 99 % accuracy. The current pipeline is built in Python, relying largely on classical OCR techniques and some OpenCV preprocessing; it does not depend on heavy machine-learning models, and I want to keep it that way as much as possible. If you must introduce lightweight AI tricks to hit the accuracy target, document them so they can be toggled on or off. Deliverables • A standalone Python script (or module) dedicated to Punjabi language and alphanumeric house numbers • Clear setup instructions plus any custom trained OCR data files you create • Sample run on at least one full Punjabi voter-list PDF showing an Excel output whose character-level accuracy exceeds 99 % Acceptance will be based on that verified accuracy score, speed and zero or less dependancy on paid AI. Once Punjabi is nailed, I’ll commission similar modules for Bengali, Oriya, Assamese, Malayalam, Kannada, Marathi and Gujarati, so think modular and reusable from the start.