I have a clear, formal definition of both “organizational hierarchy” and “toxicity” that I will share as soon as the project starts. Using that reference, I need a Large Language Model fine-tuned to recognise: • whether a piece of English text expresses toxicity, • weather the text refer to someone junior from senior. Rate the toxicity of the next based on certain parameters. The data you will receive arrives in CSV files—each row contains a single text sample plus a label column I already prepared for validation. If you would like to prototype on plain text or JSON first, that is fine, but the final pipeline must ingest the CSV format directly so I can drop new files in without extra preprocessing. What I’m expecting from you • A reproducible training script (Python + preferred deep-learning library) that loads the CSVs, applies any necessary cleaning, and fine-tunes the base model. • The trained model weights and an inference script or notebook that returns two outputs per line of text: toxicity score/label and the detected organisational level. • A brief README that explains environment setup, command-line usage, and how to extend the label set should we add additional hierarchy levels later. • Evaluation results on the held-out set I provide (precision, recall, F1 for each task). Acceptance criteria 1. Macro-F1 ≥ 0.80 on both toxicity and hierarchy detection tasks when tested on my hidden validation CSV. 2. Inference completes at ≥ 150 samples/second on an A100 or comparable GPU. 3. All code runs in a clean virtual environment using only the libraries listed in the README. If you have questions about the definitions or would like a small sample of the data before bidding, just let me know.