Quantitative measure for similarity between keywords

rida · September 19, 2022, 5:50pm

Hello…
I have been using edge impulse for developing different models. And now am working on a question…
Does model performance degrades if the keywords are too similar? And for supporting the answer i’m searching for any algorithm that can quantify the similarity between audio keywords??

Just to clear that i’m working with audio recognition (keyword spotting)
Any suggestions are welcome!

OmarShrit · September 19, 2022, 6:15pm

Hi Rida,

The question is not valid from a machine learning point of view. The quality of the models depends on the amount, variation, and quality of data samples collected used in training.

If the keywords are similar, then naturally, you need more data. If they are opposite, then enough data might be good.

Best regards,
Omar

rida · September 20, 2022, 7:06am

Actually i intend to keep the amount of data samples constant with 2 models

One that has distinct keywords
and other that has similar ones.

And if the performance metrics degrades for the one with similar keywords. I want to quantify the similarity so that the degradation can be supported with evidence!