Enterprise Real-Estate Scraper System

Замовник: AI | Опубліковано: 28.11.2025

I need an enterprise-grade web-scraping solution that can continuously collect and normalise construction and real-estate intelligence from a wide mix of sources. Priority targets include mainstream real-estate portals, major construction-company sites and a full spectrum of municipal approval websites such as MahaRERA, BMC/MCGM, PMC Pune, BBMP Bengaluru and MMRDA, along with tender boards, commencement-certificate listings and related public-infrastructure databases. The platform must handle high volume, frequent structural changes and the usual defensive measures—CAPTCHAs, rotating user-agents, IP rate limits—while still delivering clean, deduplicated data ready for downstream analytics. Think of a modular crawler farm with proxy rotation, headless-browser fall-backs and scheduled incremental updates that funnel into a single, well-documented data store (SQL, NoSQL or data lake—open to your recommendation). Key deliverables • Scraper modules for every portal mentioned, coded for easy maintenance and rapid addition of new sites • Automated parsing, validation and normalisation so every data point—approvals, tenders, floor plans, parking allocation, launch dates—lands in a consistent schema • Robust logging, alerting and retry logic so I know instantly if a source or selector breaks • Documentation and a hand-over walkthrough ensuring the system can be operated and extended by an internal data team Data Required Per Project Field Required Project Name,adress ✔ Builder Name,address, ✔ Location ✔ Approval / Status ✔ RERA ID decesion maker Contact / Phone / Email If public PDF Links, multilevel or mechanised Car Parking required as mentioned in documents somewhere You must scan project details for parking keywords: stack parking puzzle parking mechanised/mechanical parking tower parking automatic parking rotary/robotic parking pit parking circulation parking multi-level parking If you’ve previously built large-scale crawlers or data pipelines in Python, Node, Go or a comparable stack and can demonstrate reliability at scale, let’s discuss your approach and timeline. it will run continuously. Looking for someone cost-efficient, reliable, and proactive.