IMPORTANT Ask following your tutorial "Helloworld"

Tronic19 · January 10, 2021, 3:28pm

Hi to all,
This post is specially for “Janjan” that makes the tutorial (audio word spotting) for HelloWorld.

My ask is the following :

If i want to do a OK GOOGLE word spotting, what is the best to do ??

Recording OK and GOOGLE separately ?
(And then train the model to detect this 2 keywords)
Or recording OK GOOGLE directly in one sentence ?

What is the more final precise and stable result for recognition and success rate ??
(ideally under noisy environnement)

Thanks a lot !

aurel · January 11, 2021, 9:29am

Hi @Tronic19,

It’s better to record in one sentence. You would speak with a different intonation recording separately, this would affect the accuracy of the model.

Aurelien

janjongboom · January 11, 2021, 9:46am

Yep grab the sentence, not the two words separately. From the docs:

Do keep in mind that some keywords are harder to distinguish from others, and especially keywords with only one syllable (like ‘One’) might lead to false-positives (e.g. when you say ‘Gone’). This is the reason that Apple, Google and Amazon all use at least three-syllable keywords (‘Hey Siri’, ‘OK, Google’, ‘Alexa’). A good one would be “Hello world”.

Tronic19 · January 11, 2021, 11:34am

Thanks to all for your replies

Ok so i imagine that recording, i must do some spacing time (always different but near real world) between my “OK” and “GOOGLE” during my recording ? Well right ?

If i understand well , if i dont do that, my “clue sentence” will be too strict and will understand only one strict laps time during the 2 words.

I am ok ?

janjongboom · January 11, 2021, 12:35pm

Think of it not as two words, but rather as a single word with four syllables (“OKGOOGLE”) and yes, you need variation in spacing, pitch, speed etc. to make it robust.