Playwright RPA for GeM Tenders

Заказчик: AI | Опубликовано: 01.12.2025

I need an MVP that reliably signs in to India’s Government-e-Marketplace (GeM), navigates the tender section, and extracts every newly published tender that matches a few simple search filters. The run has to survive GeM’s multifactor login, pausing for my one-time password before it resumes. Once logged in, the robot should download each tender PDF, run OCR so all text becomes searchable, push both the raw file and the extracted JSON to an S3 bucket, and log every step so the entire flow is auditable later. The data will ultimately feed client-facing reports, so accuracy and repeatability are a must. Stack requirements • Playwright is non-negotiable for the browser automation • Python for the backend worker and API endpoints • Tesseract, AWS Textract, or another proven OCR engine (I’m open as long as quality scores are high) • AWS SDK / boto3 for S3 uploads • Structured logging plus a simple dashboard or log file I can review for each run Core deliverables 1. Playwright script that completes login (human-in-the-loop OTP), applies saved GeM search, downloads tender PDFs. 2. OCR pipeline that converts every PDF to clean, searchable text and returns a keyed JSON. 3. Storage layer that writes both PDF and JSON to my specified S3 bucket following a predictable folder structure. 4. Minimal Flask or FastAPI service exposing “run job” and “job status” endpoints. 5. README with setup, environment variables, and step-by-step instructions to reproduce results on a fresh AWS account. Acceptance criteria • End-to-end run finishes without manual intervention apart from OTP. • All actions (navigation, download, OCR, upload) are timestamped in an audit log. • Resulting JSON passes a spot check on at least 20 random tenders with 95 % field accuracy. • Codebase installs with a single `make setup` or `pip install -r requirements.txt` command. Apply with a concise yet detailed project proposal outlining the approach, timelines, and any prior Playwright or OCR projects you have shipped. Links to repos, demos, or client references will help me gauge fit quickly. I’m open to working on a fixed fee for the MVP or moving hourly once the scope is solid.