Improving accuracy on npd120 model

Question/Issue:
[Describe the question or issue in detail]
Even after adding an external dataset my accuracy is below 40%
Project ID:
[Provide the project ID]
AS4065U-project-1
Context/Use case:
[Provide context or use case where the issue is encountered]

Steps Taken:

  1. [Step 1] Made my own dataset
  2. [Step 2] Downloaded an external dataset
  3. [Step 3] Added my recordings to the dataset downloaded

Expected Outcome:
[Describe what you expected to happen]
I would like my accuracy to be closer to 100%.
Actual Outcome:
[Describe what actually happened]
The accuracy is below 40% no matter the changes
Reproducibility:

  • [X] Always
  • [ ] Sometimes
  • [ ] Rarely

Environment:

  • Platform: [e.g., Raspberry Pi, nRF9160 DK, etc.]
    Arduino Nicla Voice

  • Build Environment Details: [e.g., Arduino IDE 1.8.19 ESP32 Core for Arduino 2.0.4]
    Arduino IDE

  • OS Version: [e.g., Ubuntu 20.04, Windows 10]
    Windows 11 Home

  • Edge Impulse Version (Firmware): [e.g., 1.2.3]

  • To find out Edge Impulse Version:

  • if you have pre-compiled firmware: run edge-impulse-run-impulse --raw and type AT+INFO. Look for Edge Impulse version in the output.

  • if you have a library deployment: inside the unarchived deployment, open model-parameters/model_metadata.h and look for EI_STUDIO_VERSION_MAJOR, EI_STUDIO_VERSION_MINOR, EI_STUDIO_VERSION_PATCH

  • Edge Impulse CLI Version: [e.g., 1.5.0]

  • Project Version: [e.g., 1.0.0]

  • Custom Blocks / Impulse Configuration: [Describe custom blocks used or impulse configuration]
    Logs/Attachments:
    [Include any logs or screenshots that may help in diagnosing the issue]
    Additional Information:
    [Any other information that might be relevant]
    Dataset MLend_numbers

Have you had a look at the Training Graphs for your model (available on the Model Training screen) New in Studio: Training Graphs (yes… finally!) and TensorBoard Integration

These might give you an insight into whether your model training is converging or if it may need longer training (more epochs).

The other useful tool is to look at the data explorer: Data explorer - Edge Impulse Documentation

If your dataset is not well separated into clusters it may be that your data is of poor quality.

The MLend dataset is very big so it’s more than likely that you’ll need longer to train (higher epochs) to get a good quality model.

Finally- how have you set up your Impulse? Is your chosen window size big enough to match the length of the audio events (spoken words?) you’re trying to detect? If you’re trying to detect a word that takes 1 second to say with a window size of 300ms you’ll struggle to train an accurate model because the training data will be chopped up and you’ll lose the entire words.