Poor on-device performance when using the Syntiant DSP block and the Nicla Voice board

Question/Issue:
When I use the Syntiant DSP block, I only get poor on-device performance for my own models. I have also tried public projects, but unfortunately I am unable to achieve better results. When I train a model with the MFCC DSP block, it works perfectly on my phone. However, the MFCC DSP block cannot be used for Nicla Voice. The Syntiant block behaves similarly to the Audio MFE block, which is good for recognizing non-voice audio. As a last resort, I am now using the Transfer Learning block and achieving good results with it. It feels like I’m doing something wrong.

Project ID:
892621, My current project with just a bunch of audio samples
892624, Basically a clone from 42868 just build for Arduino Nicla Voice

Context/Use case:
As part of my computer science studies, I am investigating the use of local language models as opposed to conventional solutions such as Alexa and Siri. I would therefore like to create a small prototype that recognizes 3-5 words and triggers actions in HomeAssistant via BLE depending on the word recognized.

Steps Taken:

  1. Increased data set size to 10-15 minutes per label
  2. Modified samples (Record samples by phone/board, shift samples)
  3. Modified impulse configuration (number of training cycles, learning rate, auto-weight classes, data augmentation, following the guide ‘Increasing model performance’)
  4. Use a finished project like 42868 or 297564
  5. Switched learning block to Transfer Learning (Keyword Spotting)

Expected Outcome:
Decent on-device performance, at least when using public projects like 42868

Actual Outcome:
Poor on-device performance with both own and public projects when using the Syntiant DSP block. Only the ‘Transfer Learning’ learning block produces decent results.

Reproducibility:

  • [X] Always
  • [ ] Sometimes
  • [ ] Rarely

Environment:

  • Platform: Arduino Nicla Voice
  • Build Environment Details:
  • OS Version: Windows 11 25H2
  • Edge Impulse Version (Firmware): Pre-compiled firmware
  • Edge Impulse CLI Version: 1.37.2
  • Project Version: 1.0.0
  • Custom Blocks / Impulse Configuration:
    No custom blocks, Syntiant DSP, Classification learning block

Logs/Attachments:

Additional Information:
There might be a similar issue mentioned by @madhu_sajc in bug report 14489.
I recorded multiple samples using the built-in microphone. Therefore, it is unlikely that the issue is hardware-related.

Hi @tobias4433

I just tested this project Syntiant-RC-Go-Stop-NDP120 - Dashboard - Edge Impulse and works fine for me.
It’s the GO STOP public project for the NDP120.
Des it perform ok for you?

About your project, I have tested the Test-Project-Phone-Client.
With the classification block, performance are bad, but also the precision and accuracy of model testing shows a bad model.
With the transfer learning block, results are better.

You can try and play with the posterior parameters, let me know if you need assistance with them.

regards,
fv

Hi @ei_francesco

I have tested the Syntiant-RC-Go-Stop project and found that, although it performs well in the studio, its performance is not as good after the deployment on the Arduino Nicla Voice. The ‘Go’ class can be triggered consistently by saying words such as ‘hello’. It seems to be not very robust.

I had finally created the project Test-Project-Phone-Client in order to test the transfer learning block. However, I do understand that this model will not perform well with the classification learning block due to the small size of the dataset.

The Syntiant/Classification models appear to work well in the studio, but perform poorly on the board. Models with MFCC/Classification perform well in both the studio and on the phone client, but cannot be deployed to the Nicla Voice for comparison. So, is Syntiant/Transfer Learning really the only way to create a robust model?

Thank you for your help.

Best regards
Tobias

Hi @tobias4433,

Use what works best for your use case.

The posterior parameters can be useful to fine tune the perfomance of the model.

Here a quick explanation of each parameters:

  • phwin: Minimum length of continuos regions of strong average of activation
  • pth Threshold for activation
  • phbackoff number of time steps for which there will be no no action taken even if there’s a class meeting its requirements. Backoff only starts after a match
  • smoothing_queue_size size of sliding window of smoothing steps

regards,
fv

1 Like