Using Keyword Spotting on DFR1154 AI Camera

Question/Issue:
I want to implements Edge Impulse AI Edge Model for DF Robot 1154 AI Camera. Here is a first demo with ChatGPT.

I want to trigger KeyWord Spotting before capturing and sending audio to ChatGPT. I followed Edge Impulse Tutorial + Tour but I struggle at the end implementing to the ESP32. And ChatGPT hallucinate at providing the right answers … the board reboot when I call run_classifier().

Project ID:
My project ID should be 678141 I folllowed all the step generating a “Sarah” keyword with the default value “Quantized”

Context/Use case:
I’m new to Edge Impulse and just follow the tutorial because I wan tot give a try on AI on Edge and it seems the best way to do it :slight_smile:

The goal is to perform Keyword Spotting then capturing audio and send it to ChatGPT. See the youtube video demo.

Steps Taken:

  1. I follow the tutorial train on “Sarah” keyword
  2. Get the default zip Quantized I assume
  3. Throw the library + code to make it work

I think (and ChatGPT is obssessed) the problems comes

  • from a memory issue or size because of the model (I don’t believe)
  • from memory management or leak with malloc and
  • from the usage 16bit vs 8bit etc …

Expected Outcome:
I expect to get a value corresponding to the keywork spotting.
Then I’ll put the code into a task
Then I’ll trigger the rest of my code that send audio to ChatGPT, take picture, …

Actual Outcome:
The board reboot on run_classifier()

Reproducibility:

  • [x ] Always

Environment:

  • Platform: Arduino v3 on DFR1154_ESP32_S3_AI_CAM
  • Build Environment Details: PlatformIO on VSCode
  • OS Version: Windows 11
  • Edge Impulse Version (Firmware): ? Major 1 / Minor 71 / Patch 29
  • Project Version: 2

Logs/Attachments:
No logs, it compile correctly and fails at Serial.println("Run classifier...");

Additional Information:

I’m using PlatformIO hard time figuring out I just have to declare the zip all by it self in platformIO. It is configured to use Arduino v3 and so ESP_I2S.h

[platformio]
src_dir = src

[env:esp32-s3-aicam]
platform = https://github.com/pioarduino/platform-espressif32/releases/download/stable/platform-espressif32.zip
board = esp32-s3-devkitc1-n16r8
framework = arduino

build_flags = 
    -w
	-DBOARD_HAS_PSRAM
	-DARDUINO_USB_CDC_ON_BOOT=1
	-DCORE_DEBUG_LEVEL=1 
	-DCONFIG_WIFI_ENABLED

lib_deps = 
        mathertel/OneButton@^2.6.1
	gilmaimon/ArduinoWebsockets@^0.5.4
	bblanchon/ArduinoJson@^7.3.1
	./lib/keyword-spotting-v2.zip

With the help of ChatGPT. I use #include “ESP_I2S.h”

#include "ESP_I2S.h"
#include "edge-impulse-sdk/classifier/ei_run_classifier.h"
#include "model-parameters/model_metadata.h"
#include "tflite-model/tflite_learn_5_compiled.h"
#define EI_CLASSIFIER_USE_QUANTIZED 1
#define EI_MAX_AUDIO_SAMPLES 16000  // 1 seconde à 16kHz
static int8_t audio_data_int8[EI_MAX_AUDIO_SAMPLES];
static float audio_data_float[EI_MAX_AUDIO_SAMPLES];

void runEdgeImpulse() {
  Serial.println("Record 1 second of audio...");
  size_t wav_size = 0;
  uint8_t* wav_buffer = i2s_rec.recordWAV(1, &wav_size);
  if (!wav_buffer) {
    Serial.println("Erreur wav_buffer");
    return;
  }

  Serial.println("Prepare audio...");
  int16_t* audio_data_raw = (int16_t*)wav_buffer;
  size_t sample_count = wav_size / sizeof(int16_t);
  if (sample_count > EI_MAX_AUDIO_SAMPLES) sample_count = EI_MAX_AUDIO_SAMPLES;

  Serial.println("Convert int16_t to int8_t...");
  for (size_t i = 0; i < sample_count; i++) {
    int val = audio_data_raw[i] >> 8;
    if (val > 127) val = 127;
    if (val < -128) val = -128;
    audio_data_int8[i] = (int8_t)val;
  }

  Serial.println("Convert int8_t to float...");
  if (numpy::int8_to_float(audio_data_int8, audio_data_float, sample_count) != 0) {
    Serial.println("Erreur int8_to_float");
    free(wav_buffer);
    return;
  }

  Serial.println("Build signal...");
  signal_t signal;
  if (numpy::signal_from_buffer(audio_data_float, sample_count, &signal) != 0) {
    Serial.println("Erreur signal_from_buffer");
    free(wav_buffer);
    return;
  }

  Serial.println("Run classifier...");
  ei_impulse_result_t result = {};
  if (run_classifier(&signal, &result, false) != EI_IMPULSE_OK) {
    Serial.println("Erreur run_classifier");
    free(wav_buffer);
    return;
  }

  Serial.println("Print results...");
  for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
    ei_printf("%s:\t%.5f\n", result.classification[ix].label, result.classification[ix].value);
  }

  free(wav_buffer);
}

The ei_shim.cpp requires

extern "C" {
    #include <stdlib.h>
    #include <stdio.h>
    #include <stdarg.h>
    #include "esp_timer.h"

    void* ei_malloc(size_t size) { return malloc(size); }
    void* ei_calloc(size_t n, size_t size) { return calloc(n, size); }
    void ei_free(void* ptr) { free(ptr); }

    void ei_printf(const char *format, ...) {
        va_list args;
        va_start(args, format);
        vprintf(format, args);
        va_end(args);
    }

    void ei_printf_float(float f) { printf("%f\n", f); }

    uint64_t ei_read_timer_us() { return esp_timer_get_time(); }

    bool ei_run_impulse_check_canceled() { return false; }
}