Seeking Bit-Perfect DSP Feature Extraction Tool in Python/ Notebook for Model Alignment

Question/Issue:

I am experiencing numerical mismatch between the feature extraction pipeline implemented in the generated C++ SDK (specifically the MFE block, as seen in edge-impulse-run-dsp.h) and standard Python/TensorFlow libraries (e.g., tf.signal or librosa). This mismatch prevents me from accurately aligning my model training results (in Python) with the inference results (on the target device running C++).

I need a bit-perfect replication of the Edge Impulse C++ DSP pipeline to be run in a Python/Jupyter Notebook environment so I can train my model with features that exactly match those computed during on-device inference.

Does Edge Impulse provide a Python utility, module, or Jupyter Notebook template that guarantees the exact same numerical result as the deployed C++ DSP?

Project ID:

Context/Use case:

I am training a custom model using a Python framework (TensorFlow/Keras). The model requires MFE features extracted from audio data. To ensure successful deployment, the MFE feature vector used for training must match exactly the feature vector generated by the C++ runtime on the target device.

Steps Taken:

  1. Extracted MFE features using the generated C++ SDK on the target.
  2. Extracted MFE features using Python (tf.signal.stft + tf.signal.linear_to_mel_weight_matrix), ensuring all parameters (frame length, stride, Mel bins, sample rate) match the C++ configuration.
  3. Disabled Pre-emphasis in both C++ and Python to eliminate that variable.

Expected Outcome:

The floating-point feature vectors generated by the C++ code should be numerically identical (or nearly identical, considering minor float differences) to the feature vectors generated by the Python tool.

Actual Outcome:

The feature vectors show significant numerical discrepancies (due to library implementation differences, FFT result handling, or rounding logic, such as the $0\text{ Hz}$ to $300\text{ Hz}$ check in older C++ versions).

Reproducibility:

  • [x] Always

Environment:

  • Platform: [Ex: ESP32, Raspberry Pi, Custom board]
  • Build Environment Details: [Ex: Visual Studio Code]
  • OS Version: [Ex: Ubuntu 22.04]
  • Edge Impulse Version (Firmware):
  • **Edge Impulse CLI Version:

Logs/Attachments:

Additional Information:

My primary concern is model accuracy degradation due to this feature alignment issue. A bit-perfect Python DSP module would solve this training/inference alignment problem.

If you require the source python implementation for any of our DSP blocks they’re all available here: GitHub - edgeimpulse/processing-blocks: Signal processing blocks
These are the same blocks we use for training your model and match the on device implementation

1 Like

Thank you so much for response!!!
But If I use Edge Impulse’s own DSP code(processing-blocks) to generate the spectrograms, train the model externally in Python, and then import the resulting TFLite model into Edge Impulse, this should work without issues, correct?

In this setup, is there any additional step or configuration required in the Edge Impulse C++ SDK(ESP32), or will it run normally as long as the DSP and model match?

If you’re using a “BYOM” model trained outside the platform, when deploying you can still access the C++ DSP code on device, you’ll just need to call the dsp function with your chosen parameters (make sure they match the Python version). Bring your own model (BYOM) - Edge Impulse Documentation

If you train within the Edge Impulse platform this is all handled automatically. It might also be worth exploring building your training code into a simple Custom Learning Block, then you can use all the other features of Edge Impulse to train and compile Custom learning blocks - Edge Impulse Documentation