Sub-Second Real-Time AI Avatars

Бюджет: 750 $

I already have two live avatars that mirror real people as they speak to one another, yet the current pipeline adds several seconds of lag. I need your help driving total end-to-end latency (mouth movement, facial animation, and generated voice) down to a hard ceiling of one second while keeping everything completely self-hosted—no paid APIs or usage-based services. The finished solution must plug straight into my existing platform and remain free for the public to use. You will begin by profiling the present stack, spotting where frames or audio buffers pile up, then redesign the generation and streaming loop so that speech-to-text, text-to-speech, facial blend-shape synthesis, and video compositing all run in near real time. GPU acceleration, WebRTC, low-level ffmpeg calls, on-device inference with models such as Stable Audio, RVC, or ONNX-exported networks are all welcome so long as the licensing and runtime costs stay at zero. Deliverables • Bottleneck report with timing breakdowns • Optimised code (Python/Node/C++—use what is fastest) producing <1 s audio-video latency at 720p30 on a mid-range RTX-class GPU • Integration notes and a one-command deployment script (Docker or bare-metal) so my devs can slot it straight into production Acceptance will be measured with a stopwatch from microphone input to rendered frame and speaker output. If you have experience squeezing every millisecond out of real-time AI pipelines, I’m ready to hand you full test access and dive straight into the code. Please read this first and contact us if you can do this or done this before!

Python

Реєстрація