Question/Issue:
Bug/typo in Microphone contious code
Project ID:
867419
Context/Use case:
Trying to detect ring-tone like sound
Summary:
There is a mixup with the amount of bytes / amount of samples for the audiobuffer that is used in the continuous microphone example (and probably also the non-continuous)
the sample buffer created with 2048 samples:
but only half of the buffer is actually used
static const uint32_t sample_buffer_size = 2048;
static signed short sampleBuffer[sample_buffer_size];
the record task (capture_samples) the argument is used as ‘bytes to read’ directly.
xTaskCreate(capture_samples, "CaptureSamples", 1024 * 32, (void*)sample_buffer_size, 10, NULL);
....
const int32_t i2s_bytes_to_read = (uint32_t)arg;
i2s_read((i2s_port_t)1, (void*)sampleBuffer, i2s_bytes_to_read, &bytes_read, 100);
i2s_read nees the number of bytes, not the number of samples
The simplest fix would be to use call the task with sample_buffer_size* sizeof(signed short)
or sizeof(sampleBuffer[])
uint32_t capture_size = sample_buffer_size * sizeof(sampleBuffer[0]); xTaskCreate(capture_samples, "CaptureSamples", 1024 * 32, (void*)capture_size, 10, NULL);
The included example does work (to some extent), but only half of the buffer is used, and the code ‘as an example’ is rather confusing (imho)
There are some other issues with this example.
microphone data is scaled up and a lot of bandwidth is wasted:
// scale the data (otherwise the sound is too quiet)
for (int x = 0; x < i2s_bytes_to_read/2; x++) {
sampleBuffer[x] = (int16_t)(sampleBuffer[x]) * 8;
}
This actually happens because i2s uses 24 bits and when using 16 bits as a recording setting, (on esp32)
only the top 8 of those 24bits are caputured into the lower 8 bits of the recorded 16 bit sample (in reality only recording 8 bits)
so effectly throwing a lot of information away. A better approach would be to capture in 32bit and then convert to 16 bit ‘manually’ replacing the up-scaling with a downscaling from 32 to 16 bits.
since the audio samples are copied into inference swapping buffers, that would maybe me a more efficient place to perform the scaling, here is my own implementation:
here is my implementation for using 24bit i2s microphone
add a clip function and add or replace with a 32bit sample buffer
static int32_t sampleBuffer32[sample_buffer_size];
int16_t clip32(int32_t value) {
if (value > INT16_MAX) return INT16_MAX;
if (value < INT16_MIN) return INT16_MIN;
return static_cast<int16_t>(value);
}
configure i2s with 32bits
` .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,`
perform conversion while copying to buffers
static void audio_inference_callback(uint32_t num_samples)
{
for(int i = 0; i < num_samples; i++) {
inference.buffers[inference.buf_select][inference.buf_count++] = clip32(sampleBuffer32[i]/4096);
if(inference.buf_count >= inference.n_samples) {
inference.buf_select ^= 1;
inference.buf_count = 0;
inference.buf_ready = 1;
}
}
}
Actual Results:
only half of the of the audiobuffer is used
Environment:
- Platform: [esp32s3]