Audio samples: mono vs stereo


I am processing audio which was recorded on location in stereo. The model will be deployed to an Arduino Nano 33 BLE Sense with a MEMS Microphone.

How does uploading audio in Stereo affect the training of the Model vs audio being recorded in Mono on the Arduino and fed to the trained model?


Hello, when you collect data in stereo, train the model and get a good result, then the model is transferred to the microcontroller and mono is collected there, then the model will not work correctly, so stereo was trained, but you give it a mono sound and also pay attention to the purity of the sampling should be the same, but the question of why recording in mono is used on the microcontroller is very simple in microcontrollers there is little memory and this optimizes the memory and affects the processing speed. Best regards Norik.

hi @norik.badalyan, if our target is raspberry pi 4, Should we change to mono instead of stereo? Which accuracy is better between the two?

1 Like

Hello, I always use mono on microcontrollers and get good results because using two channels is very labor intensive for small chips. And on the raspberry pi 3, I used the python writing model from the Internet dataset in stereo and SR - 44.1kHz, of course, it was hard for him and I lowered the SR by 16 and of course the time by 1 second, but using the Edge pulse, you can use mono SR - 16 and get good results.Between mono and stereo during training, I did not see a huge difference, of course, except for small memory and speed. Best regards Norik.

hello @norik.badalyan, thanks for the answer. besides mono or stereo, which makes me doubt, the sample freq is between 44.1khz, 22khz, or 16khz, because I learned that humans could hear 20 Hz to 20 kHz. what I know is that the higher of sample rate will be more detailed but the computational cost is heavier. based on your experiments on RPi 3, is it still feasible to use above 16khz? how about the accuracy?

1 Like

Hi @desnug1 , yes of course you can use the higher the sampling rate the better, but of course the better is not always good, it all depends on the architecture of the model and a good dataset, I used 44.1, 22, 16 and even 8, 4 (for very weak chips) of course, the lower the frequency the greater the high-frequency losses will be, which can complicate the training of the neural network since there will be few signs, but this can be compensated for by increasing the data using several processing units or changing the architecture of the model. Best regards Norik.