I would like a self-contained ComfyUI workflow that can turn a prompt into a upto 15-second, photorealistic video and then apply frame-accurate lip-sync using the Nepali voice-over I already have. The end goal is to press “run” and walk away with a ready-to-use MP4. Scope • Design or adapt a ComfyUI graph that renders a realistic human face or full-body shot (your call, as long as it looks real). • Integrate an audio-driven mouth-tracking solution so the spoken Nepali lines match perfectly. • Optimise everything for my workstation (RTX 4090 24 GB, i9, 64 GB RAM) and install any required custom nodes, checkpoints, or Python dependencies locally. Remote session or step-by-step guidance is fine as long as I end up with a working setup. • Hand me the workflow file, any asset links, and a concise “how-to” so I can reproduce the clip with different prompts and audio later. Acceptance The clip must run upto 15 s, render without artefacts at 24–30 fps, and the lip-sync must stay aligned through the final frame. I’ll consider the job done once I can hit “queue” in ComfyUI, watch it process on my machine, and see the synced output without manual tweaks. Only freelancers with proven video generation and lipsync experience, specifically inside ComfyUI, please.