Law Firm Website Data Scraping

Бюджет: 15 $

I have a list of law-firm domains and I want all of their publicly visible “people” pages turned into a clean, structured dataset. Typical fields I expect to capture include the firm name, attorney or partner name, professional title, practice areas, email, direct phone, office location and the source-page URL, but I am happy to review any additional data points you can reliably extract. You may use Python (Scrapy, BeautifulSoup, Selenium, or a similar framework) or another proven toolset so long as the code is well commented and can be rerun on my end. Please respect each site’s robots.txt, space requests sensibly to avoid throttling, and keep logs of any pages that return errors. Deliverables • An Excel or CSV file with one row per attorney and the agreed-upon columns • The full scraping script(s) with setup instructions and a brief read-me • Error log or “couldn’t scrape” list, if any Acceptance criteria – At least 95 % of listed attorney pages successfully captured – No duplicate rows; all columns correctly populated or marked “NA” when information is absent – Code runs locally after following your instructions without additional troubleshooting on my side Once the first firm is scraped to my satisfaction, the same workflow can be repeated across the remaining sites. I’ll share the domain list and any nuance about page structures as soon as we start.

Python

Регистрация