Daily Domain File Extractor

Бюджет: 250 $

I need a self-contained script that runs once every 24 hours, crawls the web for domains I haven’t seen before, and then looks on each newly discovered host for publicly accessible text or PHP files. Every domain the crawler uncovers is in scope; I’m not excluding anything and I’m not supplying a predefined list. Here’s the flow I have in mind: 1. Start with web crawling to spot fresh domains. 2. For each new host, probe common paths and directories for *.txt, *.php, and other text-based files. 3. Copy any matches to local storage (or S3 if it’s just a quick config change), organising them by date and domain name. 4. Log everything—first-seen timestamp, checked URLs, files saved, and errors—into a simple SQLite DB or CSV so I can review activity later. 5. Optionally send a short daily summary (email or Slack) with counts of domains found and files retrieved. Deliverables • Well-commented source code (Python preferred, but I’m open) with a requirements.txt • Step-by-step setup notes for cron (or Windows Task Scheduler) so the job runs automatically each day • Sample run output demonstrating at least one domain processed and files stored correctly • A concise README explaining how to add or change file-type filters in the future I’ll mark the job complete once the script is running on my VPS, successfully detects new domains, grabs text/PHP files, and writes a clean log without manual intervention.

Реєстрація