Senior Real-Time Audio Engineer

Summary We are building an AI Assistant that helps candidates during live interviews. What the app does: Listens to the other persons voice only (from Zoom / Teams / video / system audio) Converts in other person speech to text (STT) Generates AI-suggested answers Displays both transcript and AI response on screen in real time Critical requirement (non-negotiable): The AI must NOT respond to the user’s own voice The AI must listen only to the other person’s audio A previous developer implemented the base pipeline (audio streaming, Whisper STT, AI responses), but the core enforcement, audio correctness, and production hardening are incomplete. We are looking for a senior engineer to finish this properly. What Is Already Implemented Browser audio capture (system audio + mic) WebSocket streaming (frontend → backend) Whisper STT integration AI response generation (LLM) Resume upload with PDF text extraction Basic UI displaying AI responses You will not be starting from scratch. Your Responsibilities Audio / Real-Time (Primary Focus) Enforce system-audio-only transcription for interview/Q&A mode Ensure microphone audio never triggers AI answers Fix speech detection / gating issues (false triggers, noise) Implement echo / self-trigger prevention Eliminate audio feedback / loopback issues Improve buffering, chunking, and stability Prefer AudioWorklet over ScriptProcessor where appropriate Backend (FastAPI / WebSockets) Fix audio source routing (system vs mic) Enforce session rules on the server side Add basic production safeguards (payload limits, buffer caps, timeouts) Ensure AI does not respond to its own output Frontend / UX (Light but Important) Clearly show: Interviewer transcript AI suggested answer Stable real-time UI updates during live interviews Tech Stack (Required) Frontend JavaScript / TypeScript Web Audio API (AudioWorklet preferred) WebSockets React (or similar) Backend Python FastAPI WebSockets Audio processing (PCM, WAV handling) OpenAI Whisper (or equivalent STT API) Audio Knowledge (Critical) System audio vs microphone capture in browsers Echo / loopback prevention Speech detection & gating Audio buffering and jitter handling Real-time streaming concepts Important Clarification (Read Carefully) We are looking for audio + real-time systems expertise, with experience using STT/LLM APIs (not building models). Deliverables / Definition of Done The job is complete when: User speaks alone → no transcript, no AI response Interviewer speaks via system audio → transcript appears → AI response appears AI does not trigger itself or loop UI clearly separates transcript and AI answer System works reliably in real Zoom/Teams interview

Регистрация