E-commerce AI Scraper Build

Бюджет: 750 $

I need a robust AI-driven scraper that reliably pulls product text, images, SKUs, and pricing from selected e-commerce websites and our own admin-controlled catalog portals. The script must handle pagination, dynamic content, and, where required, authenticated sessions without manual intervention. Core expectations • Extract and store: product titles/descriptions, all associated images, SKU codes, and current prices. • Export: structured CSV or JSON for data, separate folder (or S3 bucket) for images, with clear file naming that links each image back to its SKU. • Tech stack: Python with libraries such as Scrapy, Playwright/Selenium, BeautifulSoup, or a comparable approach—whatever you can prove is most efficient and resilient. Basic computer-vision or OCR hooks are welcome if they improve image handling. • Reliability: graceful error handling, automatic retries, and a simple log file so I can trace any failed requests. • Modularity: the list of target domains should live in a config file; adding a new site shouldn’t require rewriting core logic. • Documentation: brief setup guide plus inline comments so another developer can maintain the code. Self Hosted Acceptance criteria 1. 98 %+ extraction accuracy across a test set of 500 products. 2. No duplicate entries in the output. 3. Script completes a full run on at least two different sites without manual fixes. When you respond, focus on your experience with similar e-commerce or catalog scraping projects and the tools you prefer for headless browsing, concurrency control, and anti-bot mitigation. A concise overview of one or two past successes is enough—I’m mainly interested in proof that you can deliver a clean, maintainable solution on the first pass.

Регистрация