I have been working on building an audio sensor that can detect sirens. I put together a report of the things I learned while doing it, things to take into consideration and the mistakes I made.
I also put together some tools for capturing audio samples on the microcontroller, saving the raw PCM data to an SD card and a python script for converting it to a WAV file. I found it helpful for diagnosing problems in the field and for collecting training data.
The Guide is here: https://github.com/IQTLabs/Audio-Sensor-Toolkit/blob/main/guide/overview.md
And the tools are here: https://github.com/IQTLabs/Audio-Sensor-Toolkit/tree/main/sound
There are probably better approaches and improvements that can be made… I will happily take PRs to the code and writing!
2 Likes
Hi @Robotastic Fantastic guide!
As you noted you can do a bit of postprocessing on the classification results to filter out false positives. I guess you already use the continuous audio classification, which already applies a moving average filter to smooth out predictions, but we also have the ei_smooth_t structure which lets you quickly define rules like: “I need to look back four frames, and min. 2 of them should be 80%+ labeled as siren”. Not sure if this would filter out “Siren Event 3” but for everything else this would help.
Thanks @janjongboom !
That is a great point - I ended up not running inference in Continuous mode, where it was averaging across slices in a window. I was being cautious and wanted to make sure I was able to save the buffer to the SD card without causing gaps recorded audio. I did this work way back in the Fall and it just took me forever to write things up.
I will go add a note in about this . I am going to also try re-running the experiment. Thanks to Eon, I should have more cycles and memory to work with and I think I can iteratively record the slices to a file and reconstruct the Window that way. Since I have already found that there were no false negatives that were not part of a larger event… I think I can get away with not having to listen to all the recordings this time! 
The ei_smooth_t structure looks great! I think that should most post-processing filtering thats needed.
Ah, in that case either constructing a moving average filter or using the smooth struct should both work
!