Cough Counting Impulse/Sample Advice

So I’ve been working in my past time building an impulse and an application that interfaces with the API to handle the sending of data and the classification/results.

Alls well, I have an impulse that can detect the number of coughs in an X length sample with about 76% accuracy. But I realized something, my cough datasets lengths range from 1 second to 1 minute. So I’ll have a 1 second sample with two coughs, or one. I’ll have a 1 minute sample with about 40 coughs over that period or time, and different permutations of this, etc.

But my question is, if I wanted to increase the accuracy and the ability to detect the NUMBER of coughs, not just a coughing event, should I strip all of my data down into 500ms-1000ms samples of just single coughs? Or is it better/okay to have longer samples with noise in between coughs.

Also, in terms of my window parameters, I’ve done a lot of testing and the smallest of changes can COMPLETELY alter the classification results. Does anyone have any advice in terms of finding a good set of parameters for a task like this?

Right now, on average one cough usually fits into a window of about 400-500ms. So in order to not over count one cough, my increase is usually >50%, so around 350-400 for instance. Is this advisable? Is there something better I could be doing? Any tips or advice in terms of building a robust and accurate impulse?

Again, not attempting to detect coughing EVENTS but rather the number of coughs in a sample.

@DoitfortheLulz Wouldn’t a better approach be to tackle this as an events problem? Then counting is easy: just count the number of times the model was triggered. I think here:

  1. Using the ‘Split sample’ function to cut out the coughs and normalize to 500ms (so all cough samples are 500ms.)
  2. Train an impulse with window length 500 ms.
  3. Run the impulse, count how often ‘cough’ was detected.

Would be best.

@janjongboom

This is what I ended up doing, I originally had longer samples with coughing throughout but the noise/coughing overlap was starting to affect my results as my training set increased. Once I split all of the cough samples, outlining only the coughing events (utilizing the shift samples option, in order to get some of the beginning and end of each cough as well) my impulse accuracy jumped from ~70% to about 95%.

Quick question, what is the difference between accuracy and val_accuracy that is displayed in the logs when training the NN classifier?

1 Like

So during training we keep 20% of your training set and set it aside (the validation set). This way when calculating the accuracy of the network we can compare to both training & validation set, and see if the network is not overfitting too much. In an ideal case these numbers are close together, but if you have large discrepancies you can change the learning rate, or add some dropout.