ESP 32 arduino examples - example code issues

Question/Issue:
Bug/typo in Microphone contious code

Project ID:
867419

Context/Use case:
Trying to detect ring-tone like sound

Summary:
There is a mixup with the amount of bytes / amount of samples for the audiobuffer that is used in the continuous microphone example (and probably also the non-continuous)

the sample buffer created with 2048 samples:
but only half of the buffer is actually used

static const uint32_t sample_buffer_size = 2048;
static signed short sampleBuffer[sample_buffer_size];

the record task (capture_samples) the argument is used as ‘bytes to read’ directly.

 xTaskCreate(capture_samples, "CaptureSamples", 1024 * 32, (void*)sample_buffer_size, 10, NULL);
....
const int32_t i2s_bytes_to_read = (uint32_t)arg;
i2s_read((i2s_port_t)1, (void*)sampleBuffer, i2s_bytes_to_read, &bytes_read, 100);

i2s_read nees the number of bytes, not the number of samples

The simplest fix would be to use call the task with sample_buffer_size* sizeof(signed short)
or sizeof(sampleBuffer[])

uint32_t capture_size = sample_buffer_size * sizeof(sampleBuffer[0]); xTaskCreate(capture_samples, "CaptureSamples", 1024 * 32, (void*)capture_size, 10, NULL);

The included example does work (to some extent), but only half of the buffer is used, and the code ‘as an example’ is rather confusing (imho)

There are some other issues with this example.
microphone data is scaled up and a lot of bandwidth is wasted:

    // scale the data (otherwise the sound is too quiet)
    for (int x = 0; x < i2s_bytes_to_read/2; x++) {
        sampleBuffer[x] = (int16_t)(sampleBuffer[x]) * 8;
    }

This actually happens because i2s uses 24 bits and when using 16 bits as a recording setting, (on esp32)
only the top 8 of those 24bits are caputured into the lower 8 bits of the recorded 16 bit sample (in reality only recording 8 bits)
so effectly throwing a lot of information away. A better approach would be to capture in 32bit and then convert to 16 bit ‘manually’ replacing the up-scaling with a downscaling from 32 to 16 bits.

since the audio samples are copied into inference swapping buffers, that would maybe me a more efficient place to perform the scaling, here is my own implementation:

here is my implementation for using 24bit i2s microphone

add a clip function and add or replace with a 32bit sample buffer


static int32_t sampleBuffer32[sample_buffer_size];
int16_t clip32(int32_t value) {
    if (value > INT16_MAX) return INT16_MAX;
    if (value < INT16_MIN) return INT16_MIN;
    return static_cast<int16_t>(value);
}

configure i2s with 32bits


`            .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,`

perform conversion while copying to buffers
static void audio_inference_callback(uint32_t num_samples)
{
for(int i = 0; i < num_samples; i++) {
inference.buffers[inference.buf_select][inference.buf_count++] = clip32(sampleBuffer32[i]/4096);

    if(inference.buf_count >= inference.n_samples) {
        inference.buf_select ^= 1;
        inference.buf_count = 0;
        inference.buf_ready = 1;
    }
}

}

Actual Results:
only half of the of the audiobuffer is used

Environment:

  • Platform: [esp32s3]

one other thing is that I think the example assumes we are using 16khz sample rate ?
hopefully this only affects the printing of this line, but as I am investigation why my model doesnt perform very well on the device I am starting to doubt everything

ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT / 16);

could be improved with:

ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT*1000 / EI_CLASSIFIER_FREQUENCY);