Detection of repeated tones?

jasonbsteele · June 6, 2020, 7:22pm

Hi, my project is a kitchen monitor.
All sorts of appliances make beeps: dishwasher, washing machine, microwave, oven timer, and most importantly fridge/freezer alarm.

I would like eventually to be able to distinguish all of these apart but I have started small with fridge alarm and background noise (which includes radio and mixer etc.).

I have 2 minutes of noise and 1m 40s of fridge alarm. I know this isn’t really enough.

The fridge alarm beeps are spaced apart roughly as 1s tone, 1.5s silence, 1s tone so I went for a window of 2.5s and interval of 0.5s as I wanted to ensure that I would get at least a part of two tones in each sample.

I used the default values of the audio classification example for everything else.

I am only getting 60% accuracy, but before I go blindly tweaking params or gathering more data I thought I should check that it is able to detect repeating sounds as a feature (and their periodicity) and therefore has a chance of distinguishing the alarms.

Thanks

janjongboom · June 6, 2020, 7:33pm

@dansitu could help you further here!

dansitu · June 8, 2020, 5:46pm

Hi Jason,

This sounds like an interesting problem! Since you are attempting to discern between beeps, you’ve correctly identified that the model needs to be aware of the gaps between the beeps, and how long they are.

The default audio model in Edge Impulse uses 1D convolutions with a relatively small kernel size, which means that it is mostly sensitive to the frequency distribution of sounds within each short slice of audio (as configured in the “frame length” setting of the MFCC block, which is 0.2 seconds by default), but not so much to the way that the audio changes over time. This works great if it’s the “texture” of the audio we care about (for example, discerning between background noise and the sound of running water), but it won’t work as well if timing is important. To incorporate more timing information, we can use 2D convolutions, which look across both axes of the data.

I’m actually working on adding 2D convolutions to the model builder UI right now, but until that’s ready you could try copying and pasting this model into the NN block’s expert view. To open the expert view, click this button:

You can then paste in the following model:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer, Dropout, Conv1D, Flatten, Reshape, MaxPooling1D, BatchNormalization, Conv2D, MaxPooling2D, AveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.constraints import MaxNorm

# model architecture
model = Sequential()
model.add(InputLayer(input_shape=(X_train.shape[1], ), name='x_input'))
model.add(Reshape((int(X_train.shape[1] / 13), 13, 1), input_shape=(X_train.shape[1], )))

model.add(Conv2D(10, kernel_size=5, activation='relu', padding='same', kernel_constraint=MaxNorm(3)))
model.add(AveragePooling2D(pool_size=2, padding='same'))

model.add(Conv2D(5, kernel_size=5, activation='relu', padding='same', kernel_constraint=MaxNorm(3)))
model.add(AveragePooling2D(pool_size=2, padding='same'))

model.add(Flatten())
model.add(Dense(classes, activation='softmax', name='y_pred', kernel_constraint=MaxNorm(3)))

# this controls the learning rate
opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999)

# train the neural network
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=32, epochs=9, validation_data=(X_test, Y_test), verbose=2)

To find the smallest model that works, you can experiment with the number of layers, the number of filters and kernel size for each layer. A network based around 2D convolutions will be a lot larger than the equivalent 1D convolutional network, which is why we use the 1D by default.

You should also adjust the number of training epochs until you see that the model’s validation accuracy is no longer improving.

As I mentioned, we’re making improvements this week that will allow you to build these types of models without having to use the advanced view. But in the mean time, let me know how you get on with this model!

Warmly,
Dan

jasonbsteele · June 28, 2020, 4:04pm

Thank you very much for this. I have been trying it out over the last couple of days and have found that although it is now doing a much better job at classifying noise of various types against a beep alarm, it is unfortunately not succeeding with distinguishing the different types of alarm.

I have added more test data so I have fairly even amounts of each, it amounts to just over 5 minutes so more may help but it is quite difficult to acquire as my wife is fed up with me leaving the freezer door open .

I also increased the training epochs to 21 where the validation accuracy plateaued.

Here are my results

And against the test set unfortunately it faired worse, with fridge and washing_machine_finish frequently confused:

Were you able to complete the work you mentioned, and if so is it worth trying for this case?

I may be struggling with this specific case, but I have to say what a great job you guys have done in making this process as easy as possible! I’ll be spreading the word

Thanks,
Jason

dansitu · June 29, 2020, 7:21pm

Hi Jason,

You’re getting pretty nice training performance, so I’m wondering here if the issue is your minimum confidence rating:

This is the minimum probability a class must have before it is considered a match. The confusion matrix in the neural network block doesn’t factor in this number, but the model testing page does, so maybe you can get better results by reducing the number? Try a value of 0.6 and see how it goes!

We’ve also landed some features in the neural network editor so you can create the same network without having to edit the raw Keras code. Here’s an example of how you can configure it:

Try playing around with these layers and see if you can get something that works a little better.

Warmly,
Dan

janjongboom · June 29, 2020, 7:57pm

@dansitu Actually I don’t think this is the case because the testing page does not indicate ‘uncertain’ but rather misclassifies.

@jasonbsteele, I think this is a data issue and I think your 3m40s dataset is too small to generalize well.

dansitu · June 29, 2020, 8:56pm

Ah, yeah, good call—I didn’t look closely enough at the results!

Agreed that adding more data will almost certainly help. It’s also important to make sure that your training and test data are similar; if your test data happens to include background noise, etc. that isn’t present in the training data, your model won’t learn to account for it.

jasonbsteele · July 4, 2020, 9:13am

That is how much I had in my OP, my later posts go on to say that I now have over 5m. This has also even upped the ratios of data for each label.

But I take your point, and this may still be too low. Thanks

jasonbsteele · July 4, 2020, 4:28pm

I have used more data and excluded some of the data recorded on my phone as this seemed to be causing confusion. I also did a bit of editing to make the clips more representative.

And the results are looking much better! So thanks for your help.

One thing is happening now however is that quite a few of my noise test cases are coming up as “uncertain”. These test cases are a variety of samples: talk radio, music, spin cycle ,etc.

Will there be problems with trying to classify something which such a wide variety of data?

In reality it shouldn’t be a problem as I will ignore noise and “uncertain”, but I wondered if you had any guidance on this?

Thanks,
Jason

janjongboom · July 5, 2020, 11:00am

@jasonbsteele, I think uncertain would be OK for noise. Maybe as some background, this is how I built a classifier that reliably could detect events (via https://www.edgeimpulse.com/blog/audio-based-shower-timer-with-a-phone-machine-learning-and-webassembly/):

With the model available as a WebAssembly library we can tie it to the WebAudio APIs in the browser, and then live classify what is happening. For a robust model you don’t want to rely on a single classification, so rather my approach here is (source code):

Every second we use a sliding window over the last 5 seconds of data, with an increase of 250 ms. This gives 17 windows. Classify each of these.
To classify this 5 second window:

If 80% of the windows is classified as ‘shower’ => shower is on.
If 60% of the windows is classified as ‘noise’ => shower is off.
Else, state is uncertain.

These results are stored (so you have a new one every second, containing a classification over the last 5 seconds), and depending on the last 5 results:

If 3/5 are classified as ‘shower’ => show in the UI that the shower is on.
If 3/5 are classified as ‘noise’ => show in the UI that the shower is off.
Else, don’t change the UI.

This adds a bit of delay as it might take 10 seconds to determine that the shower was turned on or off, but is very resilient and a lot more accurate than looking at single windows. In my tests we were max. ~5 seconds off from the actual shower time.