Audio classifcation model issue

Question/Issue:
I built an audio classification model to detect pipeline vandalism sounds. I collected training data from YouTube for two classes vandalism (drilling, hacksaw, hammer, electric saw) and normal (traffic, footsteps, ambience, train passing). The model achieved 89-93% accuracy on the Edge Impulse validation set. However when I play the exact same audio from my training data directly to my INMP441 microphone, the model misclassifies it normal sounds are predicted as vandalism.

Project ID:
967541
Context/Use case:
Pipeline vandalism detection system using ESP32-S3 and INMP441 microphone. The model is meant to detect vandalism sounds (drilling, cutting) vs normal environmental sounds.

Steps Taken:

  1. Collected audio data from YouTube for both vandalism and normal classes
  2. Trained model on Edge Impulse achieving 89-93% accuracy
  3. Connected INMP441 microphone to ESP32-S3
  4. Played exact training audio files from phone speaker directly to INMP441 microphone
  5. Used Edge Impulse Live Classification to observe predictions

Expected Outcome:
Model should correctly classify the sounds since they are the exact same audio used in training.

Actual Outcome:
Normal sounds are consistently predicted as vandalism even when playing exact training data audio through the microphone.

Reproducibility:

  • [x] Always

Environment:

  • Platform: ESP32-S3 with INMP441 microphone
  • Custom Blocks / Impulse Configuration: MFE processing block, 2D Convolutional neural network, 1000ms window size, 16kHz sample rate, low frequency 300Hz, noise floor -40dB

Hi @davidolufemi521
So, you use Live Classification - how do the samples sound to you? Is there unexpected amount of noise / distortion?
If yes, something might be wrong with the microphone. If not, then the model might be overfit to sound profile of rather clean audio from YouTube. My suggestion would be to collect some (start with 10% of your training data) samples with INMP441 and retrain the model.

1 Like