Bad classification performance after deployment

Question/Issue:
bad performance after deployment

Project ID:
867419

Context/Use case:
I’m am trying to classify ring-tone like audio melodies

Steps Taken:

  1. used the actual device to capture audio on an sd card (my own code)
    I have 45 audio samples now for each melody (with different background sounds)

  2. Set up and training
    I currently have 3 melodies to classify, I have tried some different building blocks
    but settled on MFCC. because there are a lot of quick tones and small time variations
    in these kind of sounds I have increased the time resolution,
    Frame length 0.0125
    Frame stride 0.005
    There are about 4000 features now, but the separation is very good. clear seperated clusters
    Training goes to 100% accuracy in a couple of epochs

  3. [Step 3]
    Deploying to esp32 s3 device
    The EON compiler does not seem to work at all, I am using TFLite
    After making some changes to example code (I added a bug post for this to this forum)
    I have matched the inference code, to the same method I am using as my dataset recorder code.

It works, but classification is very poor, there is one melody that is detected sometimes, usually it detects the wrong one, no good accuracy scores.

Using the model on my mobile phone the results are also very bad

Is this a case of overfitting ? My instict says these sounds should be very seperatable.
I thought 45 recordings per melody would be enough. help ?

Environment:

  • Platform: [UM Feather eps32s3, etc.]

using the web platfrom to train and deploy as an arduino library