Hi guys, I am a computer science student who’s currently using edge-impulse for his final degree project. I would like to know if it could be possible to make a key word spotting model like the one which is in the tutorial section but only working with an specific person voice. If someone have an idea I will be very grateful.
Welcome @josemimo2
Awesome you are using Edge Impulse for your project!
Not sure how much advice we can give on this. Look for some review papers on the subject and discuss it with your supervisor though, you may find that pitch is used for identification.
To build a small test on pitch you could build a model using “approve” or “deny” for a given keyword. You can do this with a classmate or shift the pitch on a recording of your own voice.
Please do share your project with us once its published, and you can reference us through the publication we made with Harvard: [2212.03332] Edge Impulse: An MLOps Platform for Tiny Machine Learning
Best,
Eoin
Hello @josemimo2,
Not entirely sure that it will work but I’ll try a spectrogram pre-processing + anomaly detection learning block for this kind of project. As @Eoin mentioned, I’d also start with a given keyword.
Let us know of your results, I’m curious.
Best,
Louis
Can you elaborate more on this?
@Joeri I assume an example scenario @josemimo2 is talking about is: you walk up to your front door on your house and say “unlock door”. Of course the door should not open for anyone else even is the know the magical open sesame phrase.
@josemimo2 if you get this working make the Edge Impulse Studio project public. Given the trained voice Samples will be in the Project, I would like to see if I or maybe a voice actor could replicate the trained voice. Given that AI/ML can now replicate anyone’s voice, this may not be a fail-safe manner to control something. You’ll need a defense-in-depth approach. Maybe add a geo fence, finger-print scanner, forehead temperature checks (remember those?), etc.
You can have different scenarios for this type of use case. And indeed, in the case of security, you should have a combination of varying biometrics.
I think you will end with a very unbalanced dataset.
Basically I want to test if it is possible to identify the voice of a determined person. For example, in a security enviroment if I say “Open Door” the device should open the door, in other case where the person is not me, the door shouldn’t be opened.
Anomaly detection doesn’t work fine with spectrograms because of the features number