Backend for a my app using FastAPI, WebSockets/WebRTC, and modern AI services (Whisper STT, LLaMA-style LLMs, Kokoro/Coqui/Piper TTS, Wav2Lip for avatar video). The system powers realtime speech transcription, AI-assisted text polishing, and interactive avatar-based conversations. Responsibilities Backend ownership: Design, maintain, and extend FastAPI HTTP + WebSocket/WebRTC endpoints. AI integration: Connect and optimize Whisper STT, LLMs, and TTS services for low latency and high reliability. Audio/video pipelines: Handle streaming audio (buffering, resampling, normalization). Integrate avatar video generation with Wav2Lip and related tools. Data & performance Work with MongoDB (and/or relational databases) for schemas, indexing, and efficient queries. Implement caching and background jobs to keep p95 latency within strict targets. Reliability & quality: Improve error handling, observability (logging/metrics), and tracing. Write clean, well-structured, and testable Python code. Must‑Have Skills Async I/O and concurrency: Comfortable designing and debugging async workflows. Hands‑on AI integration experience with at least one of: Whisper STT or other speech‑to‑text engines. LLaMA/transformer‑based LLMs or OpenAI‑style APIs. TTS systems such as Coqui, Kokoro, or Piper. Realtime systems: WebSockets, WebRTC, or other low‑latency streaming architectures. Nice to Have GPU & deployment experience: CUDA, GPU environments, and performance tuning (CPU vs GPU). Docker, nginx, PM2, and production deployment pipelines. Background processing: Job queues/workers for heavy audio/video processing. Experience orchestrating long‑running media/AI tasks. Video processing tools: FFmpeg, Wav2Lip, or similar for video generation and post‑processing.