Your “Okay Google” or how to track the beginning of the request

Question

There is an idea to make an application in Python that would recognize the request and find the answer on the Internet.
But this application will work for days and you need to track the phrase-start somehow, let it be the same "Ok Google", but how to implement it?
Ordinary words can be recognized through SpeechKit or similar, but not for the whole day to send voice recordings to the server?
The application must somehow recognize the initial phrase itself, and then only record and send a speech recognition request to SpeechKit.

Please tell me how you can implement tracking a specific phrase without a server and an extra load (so that it works constantly in the background so to speak)

that is the question: how to recognize a fixed phrase without remote services?
analyze the input audio stream, when receiving a wave of amplitude similar to "ok, google" send it to the recognizer, if the latter correctly recognized it - send the whole following after this wave.

Maria Mamonova Maria Mamonova 41 9 · Answer 1 · 2019-03-06T08:12:01

You can use CMUSphinx, but in order for the Russian language to recognize the acoustic model, cmusphinx-ru-5.2, for example (the link for downloading the model is https://es.osdn.net/projects/sfnet_cmusphinx/downloads/Acoustic%2020% 20Language% 20Models / Russian / cmusphinx-ru-5.2.tar.gz / ). Plus, Sphinx allows you to create your own vocabulary and grammar file, which greatly accelerates the process of the application on a limited set of words / commands.

A couple of useful links:

https://habr.com/en/post/351376/ - article on Habré about using Sphinx

https://cmusphinx.imtqy.com - the official guide

Your “Okay Google” or how to track the beginning of the request

1 answer 1

More articles: