Trouble understanding microphone_continuous

Question/Issue: I have problem understanding code, especially regarding buffer size.

Project ID: 213088

Context/Use case:

While reading code in examples/esp32/esp32_microphone_continuous/esp32_microphone_continuous.ino, I had following thoughts.

static void audio_inference_callback(uint32_t n_bytes)
{
    for(int i = 0; i < n_bytes>>1; i++) {
        inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[I];
        // ... 
    }
}

Here, considering that we are dealing with 16bits(2 bytes) per sample(configured in i2s_config), n_bytes>>1 is understandable for now.

audio_inference_callback is called in capture_samples with i2s_bytes_to_read, and i2s_bytes_to_read is same as arg.

static void capture_samples(void* arg) {

  const int32_t i2s_bytes_to_read = (uint32_t)arg;
  size_t bytes_read = i2s_bytes_to_read;

  // ...
  if (record_status) {
      audio_inference_callback(i2s_bytes_to_read);
  }
  // ...

And, capture_samples is called with sample_buffer_size, which is 2048.

xTaskCreate(capture_samples, "CaptureSamples", 1024 * 32, (void*)sample_buffer_size, 10, NULL);
  1. So, i2s_bytes_to_read is 2048 (=sample_buffer_size)
// static void capture_samples(void* arg)
  for (int x = 0; x < i2s_bytes_to_read/2; x++) {
      sampleBuffer[x] = (int16_t)(sampleBuffer[x]) * 8;
  }

But here, why /2? It seems we are iterating half of the buffer.

Also here. audio_inference_callback is called with 2048 and sampleBuffer size is also 2048. So it seems like we are interacting half of buffer here too.

static void audio_inference_callback(uint32_t n_bytes)
{
    for(int i = 0; i < n_bytes>>1; i++) {
        inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[I];
        // ... 
    }
}

Is there something that I am missing?

Thank you for your help.

Hello there!
So at least for

// static void capture_samples(void* arg)
  for (int x = 0; x < i2s_bytes_to_read/2; x++) {
      sampleBuffer[x] = (int16_t)(sampleBuffer[x]) * 8;
  }

this part - sampleBuffer holds uint16_t (declared as signed short) values, but i2s_read reads bytes from the recording device. Since one uint16_t holds 2 bytes, that’s why we dividing number of bytes by two to get the number of uint16_t samples that have been recorded.
I do see another small problem - since we instantiate

static const uint32_t sample_buffer_size = 2048;
static signed short sampleBuffer[sample_buffer_size];

we’ll have buffer size of 2048 uint16_t values. But we’ll only get half of that amount actually, since i2s_read will read 2048 bytes or 1024 signed short values.

So in essence - there seems to be a problem, but of another kind here: wasted 1024 bytes of buffer. I’ll see if it the case and then apply a patch if necessary.
Do you have any issues with microphone inference though? Or does it work correctly for you?

In a related issue, see @beneficial01 code in this post. Basically Buffer[] and the Samples Buffer[] changed from 16-bit to 32-bit just to get the code to record something other than silence.

1 Like

It works fine. Thank you for the explanation.