I noticed that every audio fragment I recorded, starts with a high clipping when playing it back.
Most likely this is because the microphone is switched on and immediately starts recording.
So in the example below you see very high values (around 20000) while the rest of the audio is well below the 5000.
So I think it would be good to skip this part when training or testing the model.
Is there a way to do so ? If not then it would be great if some day it will be possible to specify a “'skip time” for training and testing.