Precise “Hey Bobo” Wake-Word

I need a production-ready wake-word model that reliably responds to “Hey Bobo” on both ESP32 and Raspberry Pi boards. Accuracy is critical: in a moderately noisy room it should trigger every time someone clearly says the phrase, yet ignore most unrelated speech. I still want slight phonetic wiggle-room—utterances such as “hello bo,” “hey bebo,” or “hello bebo” should wake the system as well—so fine-tuned threshold settings and a well-balanced dataset will be essential. What I expect from you • A trained wake-word model or firmware that runs in real time on ESP-IDF for ESP32 and on Raspbian/Ubuntu for Raspberry Pi without requiring cloud calls. • Demo code that shows how to load the model, stream audio from an onboard mic, and raise an event when the wake word (or approved variants) is detected. • Instructions for collecting additional audio samples so I can keep refining the model if needed, plus clear guidelines on adjusting sensitivity to maintain high precision while preventing false positives. • CPU- and memory-usage figures on each platform so I know exactly how lightweight the solution is. Acceptance criteria 1. Latency from spoken phrase to callback ≤ 250 ms on both targets. 2. ≥ 95 % wake-up rate in moderate background noise (office chatter, music at low volume). 3. ≤ 2 false activations per hour of continuous speech. 4. Complete build steps and source so I can reproduce the binary from scratch. You are free to use tools like TensorFlow Lite Micro, Porcupine, Picovoice DIY, or a custom DSP pipeline—as long as licensing permits commercial use. I am happy to provide extra voice recordings to help you fine-tune. If you’ve shipped similar models before, let me know; a quick video demo on actual hardware will fast-track selection. data collection and everything should be done solely by the freelanacer's job.

Python

Регистрация