We have a keyword recognition 'net that now seems to perform quite well. However, its Achille’s heel is background music. In the presence of any sort of background music the success rate falls to zero. It would seem to make sense to add background music during training but I don’t know if this is considered best-practice or a realistic approach.
If it is, would it be possible at some point to add background music or possibly just “custom” noise sources to the data augmentation option?
Yes this is a good approach in particular if you have some some specific background noise.
We have an example transformation block here: https://github.com/edgeimpulse/example-transform-block-mix-noise
This feature is reserved for enterprise subscription but you can check the shell script and apply a similar transformation locally using sox.