Hello! I am new to Edge impulse. Do anyone know how to convert Speech into text for Arduino using Edge Impulse?
For voice recognition we have a few tutorials, these are primarily aimed at recognizing pre-defined keywords and distinct sounds:
- Responding to your voice: https://docs.edgeimpulse.com/docs/responding-to-your-voice
- Recognize sounds from audio: https://docs.edgeimpulse.com/docs/audio-classification
- How to Use Embedded Machine Learning to Do Speech Recognition on Arduino: https://www.digikey.com/en/maker/projects/how-to-use-embedded-machine-learning-to-do-speech-recognition-on-arduino/1d5dd38c05d9494180d5e5b7b657804d
In terms of processing continuous speech to text, you will need to use a different speech processor on your Arduino to accomplish this, such as: https://create.arduino.cc/projecthub/msb4180/speech-recognition-and-synthesis-with-arduino-2f0363
Please let me know if you have any questions!
@jenny . Thanks for the references.
My main purpose wasn’t to create a voice command. I’m learning on how to develop a voice recognition device converting human speech into text and then saved the text information inside a microcontroller(Arduino) or maybe in cloud. From human speech, Arduino able to recognize and provide a feedback/service back to the user. Kind of like the API concept, Arduino will try its best finding what is the best matching user speech.
By the way, the third link was really helpful. Thanks * How to Use Embedded Machine Learning to Do Speech Recognition on Arduino: https://www.digikey.com/en/maker/projects/how-to-use-embedded-machine-learning-to-do-speech-recognition-on-arduino/1d5dd38c05d9494180d5e5b7b657804d.
Your best bet will be to stream this to a cloud service that does this (I know Azure has some). Processing power even on phones these days is not enough to do this very accurately.
Typical systems that do this (Siri, Alexa, etc.) use a combination of on-device and in-cloud processing: on-device they’re listening for a wake word (you can build this with Edge Impulse), and when the wake word is heard they send the data to the cloud to be analyzed.
Yes, I do understand the lack of accuracy also terms of many ways (i.e. ASR performance including gender, CPU speed, utilization status, impairment, speech quality, microphone, languages (slang), and etc.).
Yes, normally included to wake up our microcontroller. Will do that and include a wake up word for an easy testing example. However, how they can be sent to the cloud to be analyzed? So for the cloud analysis, I should start with Azure? In the past few days, i been signing in as a new Microsoft Azure user. Unfortunately, it said my current account type is not supported,
Your current account type is not supported
Create a new account and sign up
[If you are an IT admin, you can take over the directory]
[Still have questions?]
I wonder if Azure provides a limited basic access, or maybe must subscribe a license in order to use it.
I’m not entirely sure, I’ve not used the Azure one (but I have had my fair share of problems getting into Azure, so I feel your pain). Alternatively Google (and I guess every other cloud vendor) has something similar too, e.g. https://cloud.google.com/speech-to-text