Ongoing Python Scraper Maintenance

Бюджет: 25 $

I run several production-grade Python scripts on AWS EC2 that pull market-price data from up to five different websites and store the results in a SQL database. The scrapers are already built and live; what I need now is reliable, ongoing management and quick-turn updates whenever a source site changes its layout or blocking rules. The work centres on data-scraping updates, so you should feel comfortable tracing XPath/CSS changes, adjusting anti-bot measures, and ensuring the cron-driven jobs keep delivering fresh data. When an update is pushed, the pipeline must still land clean, well-typed rows in the existing database schema without breaking downstream queries. Typical tasks • Diagnose and fix failed scrape runs • Modify or extend selectors when sites update HTML or add new price fields • Optimise or re-write scraps in Python (requests, BeautifulSoup, Selenium or similar) • Tweak SQL queries/stored procedures to reflect any column or type changes • Deploy changes to the live AWS environment (Git pull, virtualenv, cron, CloudWatch metrics) Acceptance criteria • All scheduled jobs complete without fatal errors for seven consecutive days after each patch • New or changed price fields appear correctly in the SQL tables and match spot-checks on the source sites • CloudWatch alarms stay green (no elevated error rate or runtime overage) If you’re comfortable jumping into existing Python code, working directly on an AWS box, and turning around fixes quickly, I’ll supply repo access and test credentials so we can get started right away.

Python

Регистрация