Audio Sensor Guide & Tools

I have been working on building an audio sensor that can detect sirens. I put together a report of the things I learned while doing it, things to take into consideration and the mistakes I made.

I also put together some tools for capturing audio samples on the microcontroller, saving the raw PCM data to an SD card and a python script for converting it to a WAV file. I found it helpful for diagnosing problems in the field and for collecting training data.

The Guide is here: https://github.com/IQTLabs/Audio-Sensor-Toolkit/blob/main/guide/overview.md

And the tools are here: https://github.com/IQTLabs/Audio-Sensor-Toolkit/tree/main/sound

There are probably better approaches and improvements that can be made… I will happily take PRs to the code and writing!

  • Luke
2 Likes

Hi @Robotastic Fantastic guide!

As you noted you can do a bit of postprocessing on the classification results to filter out false positives. I guess you already use the continuous audio classification, which already applies a moving average filter to smooth out predictions, but we also have the ei_smooth_t structure which lets you quickly define rules like: “I need to look back four frames, and min. 2 of them should be 80%+ labeled as siren”. Not sure if this would filter out “Siren Event 3” but for everything else this would help.

Thanks @janjongboom !

That is a great point - I ended up not running inference in Continuous mode, where it was averaging across slices in a window. I was being cautious and wanted to make sure I was able to save the buffer to the SD card without causing gaps recorded audio. I did this work way back in the Fall and it just took me forever to write things up.

I will go add a note in about this . I am going to also try re-running the experiment. Thanks to Eon, I should have more cycles and memory to work with and I think I can iteratively record the slices to a file and reconstruct the Window that way. Since I have already found that there were no false negatives that were not part of a larger event… I think I can get away with not having to listen to all the recordings this time! :headphones:

The ei_smooth_t structure looks great! I think that should most post-processing filtering thats needed.

Ah, in that case either constructing a moving average filter or using the smooth struct should both work :ok_hand:!