My goal is to turn a messy collection of raw files into a clean, analysis-ready dataset hosted on Google Cloud so I can make faster, well-informed decisions. The data first needs three specific text-cleaning steps—removing duplicates, correcting typos, and standardising formats. Once those tasks are automated, the cleaned output should flow straight into a Google Cloud environment (BigQuery is ideal, but Cloud Storage plus Cloud Functions works too). After the upload, I want an exploratory analysis that highlights patterns, trends and any obvious outliers that deserve attention. Deliverables • Well-commented scripts or workflows that automate duplicate removal, spell-checking and format standardisation • A repeatable pipeline that loads the cleansed data into Google Cloud and can be triggered on demand • At least one concise insight report or dashboard with visualisations and written commentary • Clear documentation so I can rerun or extend the process without guesswork Python, SQL and native Google Cloud tools such as Dataflow, BigQuery, or Looker Studio are perfectly acceptable here, but if you have a strong case for another tool inside the Google suite, I’m happy to hear it. Accuracy, transparency, and clean code will serve as the acceptance criteria. Please include a brief timeline with milestones for cleaning, cloud setup, and insight delivery when you respond.