Quantitative measure for similarity between keywords

I have been using edge impulse for developing different models. And now am working on a question…
Does model performance degrades if the keywords are too similar? And for supporting the answer i’m searching for any algorithm that can quantify the similarity between audio keywords??

Just to clear that i’m working with audio recognition (keyword spotting)
Any suggestions are welcome!

Hi Rida,

The question is not valid from a machine learning point of view. The quality of the models depends on the amount, variation, and quality of data samples collected used in training.

If the keywords are similar, then naturally, you need more data. If they are opposite, then enough data might be good.

Best regards,

Actually i intend to keep the amount of data samples constant with 2 models

One that has distinct keywords
and other that has similar ones.

And if the performance metrics degrades for the one with similar keywords. I want to quantify the similarity so that the degradation can be supported with evidence!