Question/Issue:
I built an audio classification model to detect pipeline vandalism sounds. I collected training data from YouTube for two classes vandalism (drilling, hacksaw, hammer, electric saw) and normal (traffic, footsteps, ambience, train passing). The model achieved 89-93% accuracy on the Edge Impulse validation set. However when I play the exact same audio from my training data directly to my INMP441 microphone, the model misclassifies it normal sounds are predicted as vandalism.
Project ID:
967541
Context/Use case:
Pipeline vandalism detection system using ESP32-S3 and INMP441 microphone. The model is meant to detect vandalism sounds (drilling, cutting) vs normal environmental sounds.
Steps Taken:
- Collected audio data from YouTube for both vandalism and normal classes
- Trained model on Edge Impulse achieving 89-93% accuracy
- Connected INMP441 microphone to ESP32-S3
- Played exact training audio files from phone speaker directly to INMP441 microphone
- Used Edge Impulse Live Classification to observe predictions
Expected Outcome:
Model should correctly classify the sounds since they are the exact same audio used in training.
Actual Outcome:
Normal sounds are consistently predicted as vandalism even when playing exact training data audio through the microphone.
Reproducibility:
- [x] Always
Environment:
- Platform: ESP32-S3 with INMP441 microphone
- Custom Blocks / Impulse Configuration: MFE processing block, 2D Convolutional neural network, 1000ms window size, 16kHz sample rate, low frequency 300Hz, noise floor -40dB