Recognition of shooting event

Project ID: 191005

I am very new at machine learning and I am trying to explore a method to detect the position of poachers at a friend place by analyzing audio signals from the shots, so I would kindly ask some general input from the community.

The scope of the ML algorythm is to detect/recognise 2 types of very short events among the background noise in continuos inference mode:

  • Bullet shockwave signal
  • Muzzle blast signal

Trilateration/triangulation methods are out of scope of this thread.

The waves should look like Figures 5 and 6 according to the the paper Shooter localization and weapon classification with soldier-wearable networked sensors:

Citing the paper:

The most conspicuous characteristics of an acoustic shockwave…are the steep rising edges at the beginning and end of the signal. Also, the length of the N-wave is fairly predictable…and is relatively short (200-300 μs).

In contrast to shockwaves, the muzzle blast signatures are characterized by a long initial period (1-5 ms) where the first half period is significantly shorter than the second half [4].
…the real challenge for the matching detection core is to identify the first and second half periods properly.

And indeed this is how (more or less) the waves look like in my data:

If bullet is subsonic there will be no shockwave, only muzzle blast:

Viceversa, if the gun is silenced, there will be shockwave, but no muzzle blast (notice the horizontal lines, is the “bzzz” from the bullet when already at subsonic):

Unfortunately if a silenced gun is used along with subsonic ammunition, not much will be picked up.
Luckily poachers usually use homemade silencers on long rifles (so some muzzle blast will be heard) and they use supersonic ammunition, so a bullet shockwave should be heard.

What is important for me is the accuracy of the timestamp of the event and not so much how long the algorythm takes to recognize the event.
Meaning I need to know when exactly the event happened (within few milliseconds) but I don’t mind waiting many seconds for the answer (if possible within 5 seconds, but not critical).
Question: Will be the timestamp determined by the beginning of the window, end of the window or exact event (wave spike)? Forgive me if this seems like a silly/trivial question but have no idea and an error of milliseconds will later be equivalent to an error of meters in positioning.

Question: For the data, I extracted .wav files of 100 millisecond samples as average, is this correct (method is extracting files from the manual labels in previous images)?

Question: I also labeled as muzzle the sound of the weapon fired from close-by since technically is a muzzle blast, but to the eye it looks very different, will this induce errors?

Noise samples extracted are between 500ms and 1 second as average.
In total to start I have 110 samples for muzzle, 86 for shockwave and 90 for noise. Train / test split 80/20.

Questions: I obtained better results when I increase the window size to 60ms and window increase of 20ms compared to lower values of 20 and 8 for example.
What is the risk of increasing too much the window?
What will happen if in the same window there is a shockwave and a muzzle blast? Will I get both results or one will be masked? What to do in this case, take the overlapping sounds (they are not exactly in synch but actually shifted so it would be 2 different audios) and label them accordingly?

I am using the standard MFE processing block and the classifier learning block.
Question: Should I change the standard parameters in the MFE processing block for this scenario?

The classifier learning block runs also standard except I increased the training cycles to 300 and modified the learning rate to 0.0005.

In the end I get 92.6% accuracy and 0.27 loss for the model. Looks good to me but maybe it is overfitting?

Testing the model with 64 samples gives an accuracy of 90.11%. Again, no idea if this is good enough:

The idea would be to run it on a Raspberry Pi Pico (RP2040) or some other low cost-device.

Any input would be highly appreciated.

It seems your model is working fairly well for a proof of concept. The Edge Impulse C++ library returns the inference time as part of the result of a call to classify(). So to get the time of the sound occurrence you would subtract the inference time and then subtract your sampling time. The skew of the main loop may give you a few milliseconds of uncertainty so I’d suggest programming the bulk of you main loop in assembly.

1 Like

Thank you for your feedback.

Non related to my questions, but I hope it is easy enough that once I detect a sample classified as a “shockwave”, to send this same sample to a python(or else) mathematic function that gives me the period between the two edges of the sound wave (this period allows to calculate the perpendicular distance between the sensor and the trajectory of the bullet).

Hey @marcomillo

I think you’re on the right track with thinking in two stages…run a classifier to find windows where a shockwave exists, then do more processing on those windows to calculate the period.

Based on my experience, I think you have two ways to try to extract this period, and you should do so with convolution in the time domain (don’t try to reuse anything from MFE or classifier)

One method is that you can create two “match filers” to detect the exact sample number of the leading edge of the signals…one filter for the muzzle, one for the shockwave. To create those filters, you could average a bunch of your recordings to create what we call 'the replica". You’re essentially just convolving with the replica, and wherever the peak output of that convolution is, you call that the time of arrival (in sample #)

The other option would be to autocorrelate the window (which should contain BOTH the muzzle and the shockwave) and this should create a new signal with 3 peaks. (basically, where the shockwave convolves with the muzzle, the center peak (the one that exists for any autocorrelation, where you’ve convolved the signal sample for sample with itself), and where the muzzle convolves with the shockwave. Take the distance between 1 and 2, or 2 and 3 (or average those) and you’ll get a period estimate.

Note, this generally works best when you have the same signal transmitted twice, but since the shockwave and muzzle have the same source, I think they should still peak up nicely when convolved with each other.

Good luck, would love to hear back what you decide and how it goes!


Hi, @AlexE,

I don`t know if I understood you correctly (convolution for now is way out of my league), but the only period time I need to know is the one of the shockwave itself as in Figure 5 of my first post (between first edge and second edge).
I do not care about calculations for the muzzle blast, since I only need to know when they happen in time.

What I was thinking is once the classifier recognizes a shockwave sample, keep that raw sample to perform calculations on it in on the side, while the main inferencing process continues as usual.

Ah, ok, just the shockwave.

  • Yes, I agree, have one process classifying, and run another process on windows that classify as a shockwave
  • So you just need the duration of the shockwave, a single event? Not time BETWEEN shockwaves?

In that case, running through a high pass filter should help you find the correct peaks. Probably just measure the time delay between the two highest peaks, but you might want to constrain it to sensible ranges (like, two highest peaks that are < 10 ms away from each other, or whatever the max time of a shockwave is)

Yes, exactly.

Correct, I just need to work on single shockwave events (calculation of wavelength below, normally ųs)

Basically if I know the shockwave wavelength + shockwave timestamp + muzzleflash timestamp, BAM, I get the bearing of the shooter.
If this event is heard by another sensor or more, BAM, i can triangulate/trilaterate the shooter position.

Method 2: if only hearing muzzleflashes, with their timestamp + hearing with at least 3 sensors I can also triangulate/trilaterate the shooter position.