How to define signal processing and neural networks for audio recognition?

Hi,

First congratulations for your tutoriels : They are very clear.
Secondly the installation process/ connection to the Edge Impulse of the ST device is very intuitive.

I work on a project of “Smart trash”. My goal is to detect the nature of the object thrown (glass, metal, paper etc.) by its sound.
I envisage to use the Edge Impulse tool to record 50 sound samples with a duration of 1 second each including the sound of the object for each class. However i’m a beginner in signal processing and in deep learning.
By seeing the tutoriel /video called “recognize sounds from audio”, I understood that I had first to calculate the Mel-Frequency Cepstral Coefficients (MFCC).
My first question is : Is this methodology relevant for my use case ? If not can you advise a better method to perform what I want to do ?

My second question is : How to set the parameters in the Edge Impulse tool to determine the MFCC ?

My third question is : What could be a relevant and performant neural network architecture and the associated parameters of the neural networks for my use case ?

Thanks you for your listening,

Regards,

Lionel

1 Like

Hi @krukio, very interesting usecase! Yes, I’d follow the same steps as the ones in the “Recognize sounds from audio” tutorial. It will give you a pretty good idea whether the machine learning model works. The default parameters that are in the MFCC block should work well enough for your usecase and you can pair it with a small neural network like this:

I wouldn’t start to modify the parameters for either the MFCC block or the Neural Network just yet, add data (I think 50 sound samples is too low, but let’s try it out!) and retrain until you have >80% accuracy, then from there we can see if you need more data, parameter tweaking, or a slightly different neural network. And keep us posted!

1 Like

Hi Jan,

Thanks you for your answer and your advises !
I will proceed to the recording of sounds to build my training dataset, then I will do
the signal processing and then I will build the neural network by following your recommendations.
I will keep you posted in the first part of next week…

Have a nice week-end,

Regards,

Lionel

Hi Jan,

I decided to follow your advice ( increase the size of the dataset): I significantly increase the size of my dataset . This dataset now counts 300 sound samples per class (instead of 50 initially planned). I have 5 classes, thus my dataset counts 1500 sound samples. I’m about to start creating the model.
I envisage to split my dataset in two parts :

  • a training set (80 % of the dataset)
  • a test set (20 % of the dataset) to validate the model.
    My questions are :
  1. Do you think this method is reasonable ? What do you think ?
  2. Is the performance of the model displayed at the end of the training phase is evaluated on a validation test or is it the training error ? In other words,are you using a fraction of the training set to validate and display the performance of the model ? In this case what is the fraction of the training set used for validation ?

Thanks you for your listening,

Have a nice week-end !

Regards,

Lionel

Hi @krukiou, yes, sounds reasonable.

  1. Is the performance of the model displayed at the end of the training phase is evaluated on a validation test or is it the training error ? In other words,are you using a fraction of the training set to validate and display the performance of the model ? In this case what is the fraction of the training set used for validation ?

During training we split your training set into a training and a validation set (80-20), and the performance displayed (accuracy / loss) is the performance on the validation set. You get the performance on the test set from the ‘Model testing’ screen. We’ve split this up as models might have multiple learning blocks, and the model testing screen gives you an overview of all.

1 Like

Hi Jan,

OK, I understood.
Thanks you for your answers !

Regards,

Lionel

Hi Jan,

I begin the creation of the model with my dataset but I I encountered 2 errors :
My training set consist of several hundreds of sound samples of 2 seconds each.
I set :

  • Window size = 200 ms
  • Window step = 20 ms
  1. When I launch the generation of features, the script is raising the following error during the execution :

ERR: DeadlineExceeded - Job was active longer than specified deadline

ERR: DeadlineExceeded - Job was active longer than specified deadline

****Job failed (see above)

When I set the window step to 100 ms, the script run without error.

  1. After generating the features , when I launch the training of the neural network, the script is raising immediately the following error :

EDIT : I solved this second error by setting kernel size to 1 (instead 5 by default).
By doing this, the training run successfully.
However I obtain disappointing results (accuracy = 30 %). How can I improve the performance of the neural network ?

Creating job… OK (ID: 160025) Copying features from processing blocks… Copying features from processing blocks OK Training model Job started Traceback (most recent call last): File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py”, line 1619, in _create_c_op c_op = c_api.TF_FinishOperation(op_desc) tensorflow.python.framework.errors_impl.InvalidArgumentError: Negative dimension size caused by subtracting 5 from 1 for ‘conv1d_1/conv1d’ (op: ‘Conv2D’) with input shapes: [?,1,1,30], [1,5,30,10]. During handling of the above exception, another exception occurred: Traceback (most recent call last): File “/home/train.py”, line 113, in model = train_model(X_train, Y_train, X_test, Y_test) File “/home/train.py”, line 31, in train_model model.add(Conv1D(10, kernel_size=5, activation=‘relu’)) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py”, line 457, in _method_wrapper result = method(self, *args, **kwargs) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/sequential.py”, line 203, in add output_tensor = layer(self.outputs[0]) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py”, line 773, in call outputs = call_fn(cast_inputs, *args, **kwargs) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/convolutional.py”, line 209, in call outputs = self._convolution_op(inputs, self.kernel) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py”, line 1135, in call return self.conv_op(inp, filter) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py”, line 640, in call return self.call(inp, filter) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py”, line 239, in call name=self.name) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py”, line 228, in _conv1d name=name) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py”, line 574, in new_func return func(*args, **kwargs) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py”, line 574, in new_func return func(*args, **kwargs) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py”, line 1682, in conv1d name=name) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py”, line 969, in conv2d data_format=data_format, dilations=dilations, name=name) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py”, line 742, in _apply_op_helper attrs=attr_protos, op_def=op_def) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/func_graph.py”, line 595, in _create_op_internal compute_device) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py”, line 3322, in _create_op_internal op_def=op_def) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py”, line 1786, in init control_input_ops) File “/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py”, line 1622, in _create_c_op raise ValueError(str(e))

ValueError: Negative dimension size caused by subtracting 5 from 1 for ‘conv1d_1/conv1d’ (op: ‘Conv2D’) with input shapes: [?,1,1,30], [1,5,30,10]. Application exited with code 1 Job failed (see above)

What is going on ? Can you help me to fix these 2 issues ?

Thank you,

Regards,

Lionel

Is it possible that the sounds are not detectable in 200 ms. windows? If you’ve sliced the data up in 2 second windows it might be better to use 2 second as window size and see what the accuracy is.

1 Like

Hi Jan,
Thanks you for your clever answer !
In deed, your suggestion (to increase the windows size to 2s) was a relevant idea : I did increase the “val accuracy” to around 55%.
Then I switch to Keras mode in the “NN Classifier” window and implement some :

  • kernel_regularizer
  • Dropout layer
  • and I played with the parameters

My code :

 import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer, Dropout, Conv1D, Flatten, Reshape, MaxPooling1D, BatchNormalization
from tensorflow.keras import regularizers
from tensorflow.keras.optimizers import Adam

# model architecture
model = Sequential()
model.add(InputLayer(input_shape=(X_train.shape[1], ), name='x_input'))
model.add(Reshape((int(X_train.shape[1] / 13), 13), input_shape=(X_train.shape[1], )))
model.add(Conv1D(30, kernel_size=1, activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=1, padding='same'))
model.add(Conv1D(10, kernel_size=1, activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=1, padding='same'))
model.add(Flatten())
model.add(Dense(classes, activation='softmax', name='y_pred'))

# this controls the learning rate
opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999)
#opt = Adadelta(learning_rate=1.0, rho=0.95)

# train the neural network
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=50, epochs=200, validation_data=(X_test, Y_test), verbose=2)

The best I obtain is the following after 200 training cycles :

accuracy

As you can see, there is still a significant difference between the training accuracy and the validation accuracy. How can I continue to improve my validation accuracy ?

Thanks you for your answer(s),

Regards,

Lionel

@dansitu probably has some good ideas.

Some notes:

  1. Your model is overfitting. Moving your dropout layer after the maxpool layer instead will probably help a bit.
  2. Does your accuracy go up if you simplify the problem? E.g. try two classes (you can create a new project with some data by exporting from dashboard, then importing with the uploader).
1 Like

Hi there, thanks for using Edge Impulse :slight_smile:

It definitely seems like your model is overfitting. One way to fight against this is to reduce the size of your model, which can force it to learn the most basic underlying relationships between your data rather than rely on memorizing features that are unique to your training dataset.

Here are some things you can try:

  • Reduce the number of filters in your convolutional layers
  • Remove one of the convolutional layers (and its max pooling layer) entirely
  • Experiment with adding more dropout, including after the input layer. However, I would avoid it after the final MaxPooling1D layer (feel free to experiment with this though)

You should also try adding more training cycles until you’re sure performance is plateauing.

And it never hurts to have more data!

1 Like

Hi Dan, Hi Jan,

Thanks you for your messages. I will try your recommendations.
But first I will perform additional sound recordings to increase the size of my dataset.

Regards,

Lionel

Hi Jan,

Since today I have encountered an error with your tool.
During the training of the neural network, after several dozens of epochs, there is
the following message : “Terminated by user” :

and after that, the script is raising an error :

What is going on ? Can you help me to fix this bug ?

Thanks you,

Regards,

Lionel

Hi (again) Jan,

I have a question :
When I set different values for the “minimum confidence rating” in the “model testing”, I have different results of “accuracy” (when I’m increasing the “minimum confidence rating” , the accuracy is decreasing) which is perfectly logic.
However during the training of the neural network, the training accuracy and then validation accuracy are strictly the same whatever the value of the “minimum confidence rating”, so during the training the calculated accuracy seems to be independent of the “minimum confidence rating”…

Can you explain such behavior ?

Thanks you,

Regards,

Lionel

Hi @krukiou ,

Yes, minimum confidence rating is not used during training phase, but only to indicate the model accuracy in the model testing screen. The option should probably be moved to that screen instead, it’s a bit confusing.

Did this mean that training succeeded again?

Jan,

Thanks you for your quick answer : OK I understand.
Regarding the bug in the training phase, no, since today, the training is always systematically raising the error I described in my other post : Actually I can no longer perform my tests…

Regards,

Lionel

@krukiou, hmm… @mathijs and I have been looking at the logs, but have not found the root cause yet. The jobs seem to properly run when looking at the logs, but show a failed message in the UI. I’ve just tested running the training job on your account and now don’t see the issue - that makes debugging harder!

However, I saw that even though the error message appeared the job keeps running underneath, so after a while the metadata at the bottom should update regardless (need a page refresh). If you still run into this right now please let me know, that would help us pinpoint the underlying issue.

Jan, @mathijs ,

I just launch a couple of runs and now the training is running successfully… : Everything is fine…!
Thanks you very much for your help !

Have a nice week-end.

Regards,

Lionel

Thanks! We’ve added some logging to ensure we’ll be able to find the root cause once this happens again. :rocket:

I did a similar test project for Ambient Sound Classification. At first, I used 2 labels “Environmental Noise” and “Motor Vehicle”. Then later, I introduced a 3rd label "Alert sound - from a buzzer for my use case). At first, I had some “Over-fitting” issues and had to perform some data-cleansing and minimized the data window to 150ms. I also incorporated “K-means Anomaly Detection” and I’m running it on an edge STML4 and works perfectly, even if sounds were combined, the predominant sound will prevail although it detects the other variables.

@janjongboom Good job guys!