Automated Website Scraping Solution

I’ve been manually collecting product information from an online catalogue and now want the entire process fully automated. The goal is a repeatable script that captures every piece of relevant content on the site—text, product images, and any internal or external links—then saves it in a tidy, structured format I can work with straight away. Here’s what I need you to build and hand over: • A scraping script (Python preferred—BeautifulSoup, Scrapy, or Selenium if the pages are dynamic) that logs in if required, navigates through all catalogue sections, and pulls text, images, and links without missing hidden or paginated items. • Clean output: text and links in CSV or JSON, images downloaded into organised folders with filenames that reference their corresponding records. • A simple configuration file or clear variables so I can adjust the scraping frequency later (daily, weekly, or monthly) without touching core code. • Basic error handling: retries on time-outs, polite throttling so the site isn’t overwhelmed, and clear logging so I can see what was scraped and spot failures quickly. • Setup notes and a brief walkthrough so I can schedule the job via cron or Windows Task Scheduler on my end. You’ll be free to choose the most efficient libraries and methods, as long as the final solution runs from the command line and can be deployed on a standard VPS. Let me know your preferred stack and any questions about the site structure, and we can get started right away.

Python

Регистрация