ESP 32 Audio Classification INMP441

Hello everyone!
I am trying to adapt the program that was meant to be for Arduino Nano 33 BLE Sense to an ESP32 with an external mic (INMP441) wich works with I2S.

I was able to adapt the program and replace the PDM with I2S and it kind of works. It retrieves data to the buffers and also makes a classification, but the classification is completly wrong. When I tested the same program on a Nano 33 BLE Sense, it classifies it correctly.

The audios that I used to train the model on Edge Impulse, where recorded on the ESP32 with the INMP441 microphone to eliminate problems of compatibility between hardware and software.

I think that maybe the problem is that I am missing some data but I am not really sure.

On the Nano it displays this: (DSP 123 ms, Classification: 19 ms, Anomaly: 0 ms)
On the ESP32 it displays this:(DSP 202 ms, Classification: 32 ms, Anomaly: 0 ms)

I need help, what else do you need to know?
@janjongboom Please advise.
Thanks on advance!!!
Best regards…

Hi @AworkingM,

Do you use the same code to fill the buffer to acquire data and to run classification?
That would be helpful if you could share your sample code as well as your project ID.

One thing you can also try is testing the classification only without your mic driver, by filling the features array with a raw data sample:
If it classifies the data correctly, we can confirm that the issue comes from the mic driver.


Thanks so much for responding @aurel !

Yes, I have made live classifications and it works correctly. The Edge Impulse part of creating the models works just fine. To verify it, I uploaded the project to a Nano 33 BLE Sense, and I confirmed that it worked incredibly good.

The part that I am struggling is to adapt the I2S to retrieve data for the buffers. I sort of make it, and it works, but I belive that the problem is there because the classification is not accurate.

My project ID is 75837

An the code of the project is very long, but the part that retrieves data is this (Everything else is almost the same):

//*************************   I2S Configuration   *************************//

#include <driver/i2s.h>
#define I2S_NUM           I2S_NUM_0           // 0 or 1
#define I2S_SAMPLE_RATE   16000
#define I2S_PORT I2S_NUM_0
#define I2S_PIN_CLK       2
#define I2S_PIN_WS        15
#define I2S_PIN_DIN       13
void initI2S(){
  i2s_config_t i2s_config = {
    .mode                 = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
    .sample_rate          = I2S_SAMPLE_RATE,
    .bits_per_sample      = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format       = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = I2S_COMM_FORMAT_I2S,
    .intr_alloc_flags     = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count        = 6,
    .dma_buf_len          = 1024,
    .use_apll             = false,
    .tx_desc_auto_clear   = false,
    .fixed_mclk           = 0
  i2s_pin_config_t pin_config = {
    .bck_io_num           = I2S_PIN_CLK,
    .ws_io_num            = I2S_PIN_WS,
    .data_out_num         = I2S_PIN_DOUT,
    .data_in_num          = I2S_PIN_DIN,

  i2s_driver_install(I2S_NUM, &i2s_config, 0, NULL);
  i2s_set_pin(I2S_NUM, &pin_config);

 * @brief      i2s buffer full callback
 *             Get data and call audio thread callback
static void i2s_data_ready_inference_callback(void*arg){
  size_t bytesRead;
  int i2s_read_len=5333;
  i2s_read(I2S_PORT, (void*) sampleBuffer, i2s_read_len, &bytesRead, portMAX_DELAY);
    if (record_ready == true) {
      for (int i = 0; i<bytesRead>> 1; i++) {
        inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[i];
        if (inference.buf_count >= inference.n_samples) {
            inference.buf_select ^= 1;
            inference.buf_count = 0;
            inference.buf_ready = 1;

Can you also share how you fill the signal structure in your main loop?

Of course @aurel!

It is the same as the continuous example:

 * @brief      Arduino main function. Runs the inferencing loop.
void loop()
bool m = microphone_inference_record();
if (!m) {
    ei_printf("ERR: Failed to record audio...\n");

signal_t signal;
signal.total_length = EI_CLASSIFIER_SLICE_SIZE;
signal.get_data = &microphone_audio_signal_get_data;
ei_impulse_result_t result = {0};

EI_IMPULSE_ERROR r = run_classifier_continuous(&signal, &result, debug_nn);
if (r != EI_IMPULSE_OK) {
    ei_printf("ERR: Failed to run classifier (%d)\n", r);

if (++print_results >= (EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)) {
    // print the predictions
    ei_printf("Predictions ");
    ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
        result.timing.dsp, result.timing.classification, result.timing.anomaly);
    ei_printf(": \n");
    for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
        ei_printf("    %s: %.5f\n", result.classification[ix].label,
    ei_printf("    anomaly score: %.3f\n", result.anomaly);

    print_results = 0;

The only parts that I changed where the callback and the part to start the mic. The rest is the same…