Marketplace Scraping

Customer: AI | Published: 05.02.2026
Бюджет: 5000 $

1 Introduction and Purpose This document is issued to invite qualified development vendors to submit a fixed-price commercial proposal, together with an annual maintenance contract (AMC), for the end-to-end implementation of the Affordable Classics Automation Platform. The platform is a production-grade system designed to automate controlled account warming, marketplace listing extraction, and selective deep excavation of listings, all orchestrated by a scheduler and managed through a web-based administrative frontend. The intent of this document is to clearly explain what is to be built, what work has already been completed, and what responsibilities the selected vendor will assume. 2 Engagement Model and Commercial Expectations The engagement is expected to be delivered by a single vendor on a fixed-price basis, covering the complete system end to end. The vendor will also propose a separate AMC for post-go-live support and maintenance. The selected vendor will be responsible for development, integration, testing, deployment, and stabilization of the system in a production environment. The solution is not a proof of concept or a prototype; it is intended for live production use from day one. Deployment will be performed on AWS EC2 infrastructure provided by the client. The vendor is expected to handle all application-level deployment, runtime configuration, and process management on that environment. All outbound and inbound connections between the system and FB shall be routed exclusively through a WireGuard VPN tunnel terminating on a residential internet connection. This VPN tunnel will be initiated from the execution environment and will be used for all FB-related traffic generated by the Warming, Extraction, and Excavation modules. The WireGuard VPN infrastructure itself will be set up and provided by the client. The selected vendor will be required to coordinate with the client during implementation to agree on technical details such as port usage, routing configuration, and any application-level adjustments required to ensure all traffic is correctly routed through the VPN tunnel. The vendor is not responsible for procuring or operating the residential internet connection but must ensure the system is compatible with and correctly utilizes the provided WireGuard setup. The target delivery timeline for the complete system is between eight and twelve weeks. 3 Work Already Completed and Materials Provided A substantial amount of work has already been completed and will be provided to the selected vendor to accelerate delivery and remove ambiguity. This includes a finalized Master Specification, complete and detailed BRDs for the Warming Module, Extraction Module, Excavation Module, and Frontend, as well as multiple normative appendices covering persistent browser configuration, behavioral profile definitions, deduplication strategy, marketplace tile extraction rules, and title parsing logic. In addition, working proof-of-concept scripts for all three execution modules (Warming, Extraction, and Excavation) have already been built. These scripts validate the intended logic and flow of each module but are not production-grade. The vendor is expected to rewrite these into robust, maintainable, production-quality code in line with the specifications. 4 Overall System Responsibility The vendor will be responsible for delivering the complete system as an integrated whole. This includes the scheduler/orchestration layer, all three execution modules, the administrative frontend, database schema and migrations, concurrency handling, logging, error handling, and deployment. The vendor is expected to follow the supplied specifications closely rather than redesigning system behavior. Any deviations or assumptions must be explicitly stated in the proposal. 5 Scheduler and Orchestration Layer The scheduler is responsible solely for orchestration and control. It does not perform any interactions itself. Its role is to trigger executions of the Warming, Extraction, and Excavation modules based on configuration values set in the frontend, including enable/disable toggles, concurrency settings, cooldown rules, and circadian sleep windows. When concurrency mode is enabled for Warming or Extraction, the scheduler must start the first process immediately and start the second process after a randomly selected delay between two and five minutes. After both concurrent executions complete, the scheduler must wait for a randomly selected gap of one to five minutes before starting the next cycle. For the Excavation process, the scheduler must enforce batching logic. If the number of listings selected by the user exceeds five, the scheduler must automatically split them into batches of five (with the final batch possibly smaller) and process these batches sequentially. The scheduler must be configuration-driven, deterministic in control flow, and must not maintain hidden state between cycles. 6 Warming Module The Warming Module is responsible for safely increasing account credibility through controlled, human-like usage. Each warming run processes exactly one account using a persistent browser profile permanently bound to that account. The module must attempt cookie-based login first and fall back to credential-based login with two-factor authentication handling if required. All behavior during a warming session is governed by the assigned behavioral profile. This includes scrolling, clicking, pausing, typing, idle behavior, and navigation tendencies. Randomness is permitted only within bounded ranges defined by the profile. During a warming session, the module performs a variable sequence of two to four navigation paths selected from homepage browsing, group browsing, marketplace browsing, and friends browsing. Marketplace interaction always includes the Vehicles category and one additional weighted category. Optional posting may occur using a pre-approved asset library. The Warming Module does not extract marketplace data. Credibility gain is based solely on the duration of successful, uninterrupted, authenticated usage. Deterministic account locking, cooldown application, and cleanup behavior are mandatory. 7 Extraction Module The Extraction Module is responsible for collecting marketplace listing tiles at scale without performing deep inspection. Each extraction run processes exactly one account using its persistent browser profile and assigned behavioral profile. The module navigates marketplace views and extracts listing tiles only, without opening listing detail pages. Extracted listings are deduplicated according to the defined strategy and stored in the database. URL locking must be applied to prevent concurrent processing of the same listings. The Extraction Module does not perform excavation, scoring, or enrichment. Credibility updates are applied strictly according to the extraction-safe rules defined in the BRD. 8 Excavation Module The Excavation Module performs deep extraction on a limited, explicitly selected set of listings. This module does not perform warming or extraction behavior. It operates only on listings selected by the user through the frontend. Each excavation run may process a maximum of five listings using exactly one Account. If more listings are selected, batching must be applied automatically and processed sequentially. Excavation includes full description capture, image/video collection, seller information extraction, and structured attribute parsing. The module updates listing records only. 9 Frontend The frontend provides administrative control and visibility into the system. It allows administrators to enable or disable modules, configure concurrency settings, manage accounts and behavioral profiles, monitor runs, review logs, and select listings for excavation. The frontend is strictly a control and visibility layer. It must not duplicate backend business logic. 10 Deployment and Environment The vendor will deploy the system on client-provided AWS EC2 infrastructure. This includes application setup, runtime configuration, and process supervision. Infrastructure costs, account sourcing, proxy or VPN provisioning, and ongoing operational execution are outside the scope of this engagement. 11 Annual Maintenance Contract (AMC) The vendor must submit a separate AMC proposal covering post-production support, including bug fixes, minor enhancements, dependency updates, and defined support service levels. 12 Vendor Response Requirements Vendors must submit a fixed-price quote, an implementation timeline, AMC pricing, team composition, and a clear list of assumptions or deviations from the provided specifications.