Text Document Cleaning Automation

Замовник: AI | Опубліковано: 16.01.2026

I have a collection of plain-text documents that need to be run through a reliable clean-up routine. The files vary in size, but each one is a standard .txt that currently suffers from the usual clutter—think stray line breaks, repeated lines, riddled-in special characters, and inconsistent spacing. Here’s what I want from you: • A repeatable, script-based solution (Python, Perl, Bash, or similar) that I can drop any future text file into and receive a cleaned version out. • The option to toggle common tasks—deduplicate lines, normalise whitespace, remove non-ASCII or other special symbols—so I can activate only what each document actually needs. • Clean, well-commented source code and a quick usage note so I can run everything from the command line on macOS or Linux. I will supply a sample set of documents right after kickoff; you’ll deliver the cleaned output along with the script. Once I confirm the script handles the entire sample set without mangling the original meaning or structure, we’re done. Google Drive link: https://drive.google.com/file/d/1liFKPdMSU4k7CQT084lP3Khc-Md-DuG4/view?usp=drive_link