Work scope

Замовник: AI | Опубліковано: 30.11.2025

I need a meticulous data-gatherer to build a clean archive of investor-relations materials for 70 publicly listed companies. Your job is to visit each company’s IR page, trace every quarter from the latest release back to Q1 2021, and pull the full earnings presentation for each period. Most files are published as PDF, though an occasional HTML page or alternate format appears; wherever they sit, I expect the complete plain-text content to be extracted accurately. I already use a strict naming convention and folder hierarchy, so each presentation must be saved as an individual .txt file, titled exactly as specified and nested by company → year-quarter. When everything is collected, compress the entire set into a single ZIP. Speed is important but precision matters more. Before awarding the full assignment, I will give you 3–4 sample companies so we can confirm extraction quality, text fidelity, and file naming. Once the sample looks perfect we will move to all 70 firms. Deliverables • Plain-text earnings presentations for every quarter (latest → Q1 2021) • Files named per my convention and stored by company/quarter • One final ZIP containing the full, clean archive If you are comfortable scraping IR sites, parsing PDFs, and automating bulk text extraction with tools such as Python, BeautifulSoup or similar, this will be straightforward. Accuracy, attention to detail, and consistent file structure are absolutely essential.