Scrape Government Public Records PDFs

Бюджет: 30 $

Government portals that publish public records but do not offer bulk-download options. I need an automated solution that can search by number on this page and download each file in its native PDF form. Here is what I am after: • A repeatable scraper—Python capable of searching in specific domain, following pagination, and collecting accessible PDF link. • The script should save the PDFs locally in a clear folder structure (site / year / category). • A simple log or CSV report listing the URL, document title, and download status for every file processed. Acceptance criteria 1. All public records published in the specified date span are present as intact PDFs. 2. The log matches the count of files actually downloaded. Please make sure the code is well commented and easy for me to rerun whenever new records are released. If you have dealt with anti-bot mechanisms on government sites before, let me know—some domains may throttle or deploy basic captchas and I want the scraper to handle those gracefully while remaining compliant with each site’s terms of use.

Python

Реєстрація