Automated Product Info Extraction Pipeline

Заказчик: AI | Опубликовано: 31.12.2025
Бюджет: 60 $

I already have a clean JSON file that holds the product names to be monitored. What I need is an end-to-end pipeline that, on every run, will: • Open a Google-only search for each product name • Capture the first eight organic result URLs in ranking order • Visit those pages, pull the raw HTML, and pass it to Gemini or the ChatGPT API so the model can reliably extract price, currency, brand, description and any stated delivery time • Write the collected data to both CSV and JSON, saving fresh files after every run and tagging them with a timestamp • Survive the real-world web: captchas, bot checks, rate limits, timeouts, pages that hold multiple variants, or sites that omit a field entirely all need graceful handling and clear logging Please package the code so I can spin it up on my own machine or a small cloud instance, tweak the schedule in one place, and swap API keys or proxy settings without digging through the logic. Well-documented source, a small sample set, .env, and a quick README that shows how to run a manual test round will be the acceptance criteria.