Fine-Tune Whisper for Yiddish Lectures

Бюджет: 3000 $

I have a growing collection of Yiddish lecture recordings—some streamed live, others stored as audio files—that must be transcribed with near-publishing accuracy. Whisper’s out-of-the-box performance is promising, yet it still drops domain-specific vocabulary and speaker nuances. I want a fine-tuned Whisper model that reliably converts these lectures to text, whether the audio arrives in real time or from an archive. You will receive aligned Yiddish audio–text pairs plus a small evaluation script. Use whichever Whisper implementation you prefer—OpenAI’s official repo, Hugging Face Transformers, or another PyTorch-based fork—so long as the final system: • archives single-digit word-error-rate on my hold-out set • preserves accurate timestamps for downstream subtitling • runs on a single A100 GPU and can be quantized for CPU inference Deliverables I need back: • the fine-tuned model checkpoint(s) • reproducible training and inference scripts or notebooks, with hyper-parameters clearly logged • a brief report comparing baseline vs. tuned performance (WER, CER, latency) Once these items pass my internal test—live Zoom feed and batch file transcription—the project is complete.

Python

Регистрация