Preparing audio

Hello all,

I want to train my uC on particular words, namely when I call the device I would like to give it an order, something like: hey jack, turn on, at 1, or hey jack turn on, at 2, … eventually I would like to ask the jack to turn off, like hey jack turn off.

Therefore, I have recorded plenty of different voices saying hey jack and on, off, and numbers from 1 to 10… indeed I have done this separately so I can manipulate each word. However, I read about the problems of recognizing an individual word like one … etc namely the false positive recognition…

Anyway, I am wondering do I have to combine the words into one syllabus like the one I want to react on: hey jack turn on at 1 or train individual words, like one, two, hey jack, etc. ?

Would that be appropriate to start learning the algorithm, device or would it be better to train the whole order, like hey jack turn on at 1, etc?

Any reply and help is beneficial!

Looking forward!

BR

I’d make a distinction between “hey jack” (or maybe something with more syllables, I think it’s hard to build a very robust keyword spotting model with just two syllables) as the initial trigger word, and then train with individual commands (turn on / at 2). The nice thing is that if you hear “hear jack” you’re confident that right after you’ll get a different command from a whitelabeled list of words, so you can ignore the times the model triggers on e.g. “turn on” if there was no hey jack right before that.

Thank you, indeed, I was thinking to go with the “hey jack” and afterwards “turn on / at 2” “… at 3 etc…” Would that be ok to train the model?

What do you think if I would juggle the recoded voices around? Eg. if I took the “hey jack” from one person and “turn on” from the other, while I take the number from the third, and so on? That would mix the complexity even further… which would test the model even harder, would not ?

Indeed, I am using the “dumb” var such kids voices including words that does not mean anything to the commands, like noise and clutter…

BR.

@mu234 There’s no absolute truth, but in general adding more variance to your training set will harden the model more. Especially with small models it might be very set on just specific voices, so varying early on is a good idea to validate.

Hi Jan

thank you for your input so far!

I have prepared the audio, including a bunch of cluttering… different gibberish chat, where along are also the key words, but said separately, not in a context of so called key words…, I have even recorded my kids saying some words… anyway, I am not really sure, if I understand the whole learning concept.

For example, now I have :slight_smile:

  1. woman say “hay jack”

  2. man say “hay jack”

  3. man “turn on at 1… 10”

  4. woman “turn on at 1… 10”

  5. man “off”

  6. woman “off”

Indeed, there is more of woman recordings than man :slight_smile: However, I have add two kids, this is marginal… only two and their level of saying the words is way to poor to be used as voice, but rather noise, however, I have used it as voice :slight_smile:

As you can seen I have recorded woman and man separately, but I guess I should have them under one label, right? For example, “hey jack” said by woman or man should be under the same label in data acquisition, that should follow also for the rest of the words, like turn on 1…10 , off?

Looking forward to start dealing with impulses, etc.

BR

1 Like

Yes all phrases should have the same label (heyjack or something), and then a separate label per phrase.

Cheers Jan, went trough a 1d phase of convolutional training and got accuracy of 96% and more, etc. However, when I want to make a Live classification I got an error that my sample rate is inaccurate, indeed, the training audio is on 44.1kHz while the sampling rate of live “recording” wants to be at 16kHz!

I guess one way would be to resample of the training/testing audio, but that would be time consuming, therefore, is there another way or should the data be downloaded, resampled and uploaded again?

PS. is there a function within the audio processing block to do this?

1 Like

Hi @mu234, we currently only have downsampling in our enterprise version - not in the free projects for now. But if you’re using your phone you can test at 44KHz:

image

You’ll probably want downsample with e.g. sox and retrain at 16KHz, e.g. by:

  1. Export your raw data (Dashboard > Export, and selecting WAV files).
  2. Convert everything:
cd whereeveryouexporteddata
mkdir -p 16khz/training
find training -maxdepth 1 -type f -exec sox {} -r 16000 16khz/{} \;
mkdir -p 16khz/testing
find testing -maxdepth 1 -type f -exec sox {} -r 16000 16khz/{} \;
1 Like

Cheers Jan! This is exactly what I have done, SOX is really a good and simple way of doing this …

Now I have moved on, and have done the NN stage too, I can download the bin to the board via the command line, namely with the edge-impulse-flash… and it works, however, I would like to move on…
Therefore, I am wondering how to bring that C code to the Segger IDE, any ideas?

I guess there are instructions that could be followed? In addition, I guess this should be done with the “edge-impulse-standalone.emProject” and e.g. the Segger IDE, but somehow I cannot compile the “edge-impulse-standalone.emProject” got an error, see below:

Building 'edge-impulse-standalone' from solution 'edge-impulse-standalone' in configuration 'Debug'
1> build/bin/FreeRTOS_CLI.o does not exist.
1> Compiling 'FreeRTOS_CLI.c'
1> /opt/gcc-arm-none-eabi-8-2018-q4-major/bin/arm-none-eabi-gcc -c -mthumb -fno-exceptions -Wno-unused-variable -nostartfiles -Wno-unused-parameter -Wno-parentheses -Wno-unused-function -ggdb -fno-common -fmessage-length=0 -std=gnu99 -DSRAM -DPROJ_NAME=edge-impulse-standalone -DECM3532 -I../../Thirdparty/edge_impulse -I../../Thirdparty/edge_impulse/edge-impulse-sdk/CMSIS/DSP/Include -I../../Thirdparty/edge_impulse/edge-impulse-sdk/CMSIS/Core/Include -I../../Platform/ECM3532/M3/hw/hal/common/include -I../../Platform/ECM3532/Common/framework/inc -I../../eta_ai_bsp/inc -Isrc -I../../Platform/ECM3532/M3/framework/executor/include/pub -I../../Platform/ECM3532/M3/NN_kernels/include -I. -I../../Platform/ECM3532/M3 -I../../Platform/ECM3532/M3/util/include -I../../Platform/ECM3532/M3/hw/include -I../../Platform/ECM3532/M3/hw/include/ecm3532 -I../../Platform/ECM3532/M3/hw/board/ecm3532/ai_vision/include -I../../Platform/ECM3532/M3/util/console/include -I../../Platform/ECM3532/M3/util/dsp_helper/include -Iinclude -I../../Platform/ECM3532/M3/hw/csp/ecm3532/common/csp/inc -I../../Platform/ECM3532/M3/hw/csp/ecm3532/m3/reg/inc -I../../Platform/ECM3532/M3/hw/csp/common/inc -I../../Platform/ECM3532/M3/hw/csp/ecm3532/m3/csp/inc -I../../Thirdparty/FreeRTOS/Source/include -I../../Thirdparty/FreeRTOS/Source/portable/GCC/ARM_CM3 -I../../Thirdparty/FreeRTOS-Plus/Source/FreeRTOS-Plus-CLI -I../../Platform/ECM3532/M3/hw/hal/common/include -I../../Platform/ECM3532/M3/hw/hal/ecm3532/include -I../../Platform/ECM3532/M3/hw/drivers/spi_flash/common -I../../Platform/ECM3532/M3/hw/drivers/spi_flash/maxim -I../../Platform/ECM3532/M3/framework/rpc/src -I../../Platform/ECM3532/M3/framework/rpc/include -I../../Platform/ECM3532/Common/framework/inc -I../../Platform/ECM3532/M3/hw/ipc/common/include ../../Thirdparty/FreeRTOS-Plus/Source/FreeRTOS-Plus-CLI/FreeRTOS_CLI.c -MD -MF build/bin/FreeRTOS_CLI.d -fno-diagnostics-show-caret -o build/bin/FreeRTOS_CLI.o -O3 -g -ffunction-sections -fdata-sections -Wall -mcpu=cortex-m3 -mfpu=vfp -mfloat-abi=soft -mlittle-endian
2> build/bin/croutine.o does not exist.
2> Compiling 'croutine.c'
2> /opt/gcc-arm-none-eabi-8-2018-q4-major/bin/arm-none-eabi-gcc -c -mthumb -fno-exceptions -Wno-unused-variable -nostartfiles -Wno-unused-parameter -Wno-parentheses -Wno-unused-function -ggdb -fno-common -fmessage-length=0 -std=gnu99 -DSRAM -DPROJ_NAME=edge-impulse-standalone -DECM3532 -I../../Thirdparty/edge_impulse -I../../Thirdparty/edge_impulse/edge-impulse-sdk/CMSIS/DSP/Include -I../../Thirdparty/edge_impulse/edge-impulse-sdk/CMSIS/Core/Include -I../../Platform/ECM3532/M3/hw/hal/common/include -I../../Platform/ECM3532/Common/framework/inc -I../../eta_ai_bsp/inc -Isrc -I../../Platform/ECM3532/M3/framework/executor/include/pub -I../../Platform/ECM3532/M3/NN_kernels/include -I. -I../../Platform/ECM3532/M3 -I../../Platform/ECM3532/M3/util/include -I../../Platform/ECM3532/M3/hw/include -I../../Platform/ECM3532/M3/hw/include/ecm3532 -I../../Platform/ECM3532/M3/hw/board/ecm3532/ai_vision/include -I../../Platform/ECM3532/M3/util/console/include -I../../Platform/ECM3532/M3/util/dsp_helper/include -Iinclude -I../../Platform/ECM3532/M3/hw/csp/ecm3532/common/csp/inc -I../../Platform/ECM3532/M3/hw/csp/ecm3532/m3/reg/inc -I../../Platform/ECM3532/M3/hw/csp/comlibrarymon/inc -I../../Platform/ECM3532/M3/hw/csp/ecm3532/m3/csp/inc -I../../Thirdparty/FreeRTOS/Source/include -I../../Thirdparty/FreeRTOS/Source/portable/GCC/ARM_CM3 -I../../Thirdparty/FreeRTOS-Plus/Source/FreeRTOS-Plus-CLI -I../../Platform/ECM3532/M3/hw/hal/common/include -I../../Platform/ECM3532/M3/hw/hal/ecm3532/include -I../../Platform/ECM3532/M3/hw/drivers/spi_flash/common -I../../Platform/ECM3532/M3/hw/drivers/spi_flash/maxim -I../../Platform/ECM3532/M3/framework/rpc/src -I../../Platform/ECM3532/M3/framework/rpc/include -I../../Platform/ECM3532/Common/framework/inc -I../../Platform/ECM3532/M3/hw/ipc/common/include ../../Thirdparty/FreeRTOS/Source/croutine.c -MD -MF build/bin/croutine.d -fno-diagnostics-show-caret -o build/bin/croutine.o -O3 -g -ffunction-sections -fdata-sections -Wall -mcpu=cortex-m3 -mfpu=vfp -mfloat-abi=soft -mlittle-endian
3> build/bin/list.o does not exist.
2> In file included from ../../Thirdparty/FreeRTOS/Source/include/FreeRTOS.h:56,
2>                  from ../../Thirdparty/FreeRTOS/Source/croutine.c:28:
2> ../../Thirdparty/FreeRTOS/Source/include/FreeRTOSConfig.h:30:10: fatal error: config.h: No such file or directory
2> compilation terminated.
3> Compiling 'list.c'
3> /opt/gcc-arm-none-eabi-8-2018-q4-major/bin/arm-none-eabi-gcc -c -mthumb -fno-exceptions -Wno-unused-variable -nostartfiles -Wno-unused-parameter -Wno-parentheses -Wno-unused-function -ggdb -fno-common -fmessage-length=0 -std=gnu99 -DSRAM -DPROJ_NAME=edge-impulse-standalone -DECM3532 -I../../Thirdparty/edge_impulse -I../../Thirdparty/edge_impulse/edge-impulse-sdk/CMSIS/DSP/Include -I../../Thirdparty/edge_impulse/edge-impulse-sdk/CMSIS/Core/Include -I../../Platform/ECM3532/M3/hw/hal/common/include -I../../Platform/ECM3532/Common/framework/inc -I../../eta_ai_bsp/inc -Isrc -I../../Platform/ECM3532/M3/framework/executor/include/pub -I../../Platform/ECM3532/M3/NN_kernels/include -I. -I../../Platform/ECM3532/M3 -I../../Platform/ECM3532/M3/util/include -I../../Platform/ECM3532/M3/hw/include -I../../Platform/ECM3532/M3/hw/include/ecm3532 -I../../Platform/ECM3532/M3/hw/board/ecm3532/ai_vision/include -I../../Platform/ECM3532/M3/util/console/include -I../../Platform/ECM3532/M3/util/dsp_helper/include -Iinclude -I../../Platform/ECM3532/M3/hw/csp/ecm3532/common/csp/inc -I../../Platform/ECM3532/M3/hw/csp/ecm3532/m3/reg/inc -I../../Platform/ECM3532/M3/hw/csp/common/inc -I../../Platform/ECM3532/M3/hw/csp/ecm3532/m3/csp/inc -I../../Thirdparty/FreeRTOS/Source/include -I../../Thirdparty/FreeRTOS/Source/portable/GCC/ARM_CM3 -I../../Thirdparty/FreeRTOS-Plus/Source/FreeRTOS-Plus-CLI -I../../Platform/ECM3532/M3/hw/hal/common/include -I../../Platform/ECM3532/M3/hw/hal/ecm3532/include -I../../Platform/ECM3532/M3/hw/drivers/spi_flash/common -I../../Platform/ECM3532/M3/hw/drivers/spi_flash/maxim -I../../Platform/ECM3532/M3/framework/rpc/src -I../../Platform/ECM3532/M3/framework/rpc/include -I../../Platform/ECM3532/Common/framework/inc -I../../Platform/ECM3532/M3/hw/ipc/common/include ../../Thirdparty/FreeRTOS/Source/list.c -MD -MF build/bin/list.d -fno-diagnostics-show-caret -o build/bin/list.o -O3 -g -ffunction-sections -fdata-sections -Wall -mcpu=cortex-m3 -mfpu=vfp -mfloat-abi=soft -mlittle-endian
1> In file included from ../../Thirdparty/FreeRTOS/Source/include/FreeRTOS.h:56,
1>                  from ../../Thirdparty/FreeRTOS-Plus/Source/FreeRTOS-Plus-CLI/FreeRTOS_CLI.c:33:
1> ../../Thirdparty/FreeRTOS/Source/include/FreeRTOSConfig.h:30:10: fatal error: config.h: No such file or directory
1> compilation terminated.
3> In file included from ../../Thirdparty/FreeRTOS/Source/include/FreeRTOS.h:56,
3>                  from ../../Thirdparty/FreeRTOS/Source/list.c:30:
3> ../../Thirdparty/FreeRTOS/Source/include/FreeRTOSConfig.h:30:10: fatal error: config.h: No such file or directory
3> compilation terminated.
Build failed

Any ideas how to fix this, if I could actually make use of it? I just want to use the C code from the edge impulse and turn on/off the LEDs on my dev board… I am on linux and working with the eta… sensor board.

Any help is much appreciated!

BR.

PS: sorry for posting this here, probably it should have a different title, I am sure the forum admin will organise it properly…

@mu234 Good question… We’re normally just building with cmake / make as described in the https://github.com/edgeimpulse/example-standalone-inferencing-ecm3532 - I seem to remember from a Nordic project that the Segger IDE does not build C++ parts, so that might be it.

Note that https://github.com/edgeimpulse/firmware-eta-compute-ecm3532 also has drivers for the audio and examples present already - might be easier to get started on that (you can replace the model by replacing the model-parameters / tflite-model folders).