Use PSRAM for impulse deployment on ESP32-S3

Hi

I’ve trained a small classifier model (project id: 563549). For deployment, I want to run this on ESP32S3. So, I exported the Arduino library and tried running the static_buffer.ino example file through Arduino & Platformio both. I’ve been getting this error:

ERR: Failed to allocate persistent buffer of size 512, does not fit in tensor arena and reached EI_MAX_OVERFLOW_BUFFER_COUNT
Guru Meditation Error: Core  1 panic'ed (StoreProhibited). Exception was unhandled.

My esp32 board has an 8mb psram and was wondering if the internal SDK is using the PSRAM or not? If not by default, how can I make use of it such that the model runs on my Edge device?

Thanks!

After trying further and moving past that error, tried running the same code, and I got this return type:

EI_IMPULSE_TFLITE_ARENA_ALLOC_FAILED = -6

Serial output:

ERR: Failed to run classifier (-6)
Edge Impulse standalone inferencing (Arduino)
ERR: failed to allocate tensor arena

It’s hard to pinpoint what could have gone wrong or what else I can try to make it run on the ESP32-S3. Any help is much appreciated, thanks!

PSRAM allocations managed automatically with ESP IDF, see more info here Support for External RAM - ESP32 - — ESP-IDF Programming Guide v5.2.3 documentation
i.e. if you have it enabled, it will be used by firmware the same way normal RAM is used.

Do you have it enabled for your project?
Also how did you “trying further and moving past that error, tried running the same code”? If you increased the arena size and running into ERR: failed to allocate tensor arena it maybe possible that you don’t have PSRAM enabled for the project…

Hi @AIWintermuteAI

Thanks for the reply.

I think yes the code is using PSRAM automatically as I tried printing the heap and PSRAM consumption. I made some changes here and there in the configurations and reduced the model input size from a 96x96 image to a 48x48 image. Still the same ‘Failed to run classifier’ error. Please find below my project configurations (platformio.ini):

; PlatformIO Project Configuration File
;
;   Build options: build flags, source filter
;   Upload options: custom upload port, speed and extra flags
;   Library options: dependencies, extra library storages
;   Advanced options: extra scripting
;
; Please visit documentation for the other options and examples
; https://docs.platformio.org/page/projectconf.html

[env:esp32-s3-devkitc-1]
platform = espressif32
board = esp32-s3-devkitc-1
framework = arduino
board_build.mcu = esp32s3

; board_build.flash_mode = dio                 ; tried these as well
; board_build.memory_type= dio_opi        ; tried these as well
board_build.flash_mode = qio 
board_build.memory_type= qio_opi
board_build.partitions = min_spiffs.csv  
board_build.f_flash = 80000000L
board_build.f_cpu = 240000000L

board_upload.maximum_ram_size = 8388608
board_upload.flash_size: 16MB
board_upload.maximum_size = 16777216

build_unflags = -std=gnu++11
build_flags = 
	-std=gnu++17
	-DBOARD_HAS_PSRAM
	-DARDUINO_ESP32S3_DEV
	-DARDUINO_USB_CDC_ON_BOOT=1
lib_deps = 
	.\ei-bay-occupancy-arduino-1.0.6.zip

upload_speed = 115200
monitor_speed = 115200

Should I reduce the input size even more and try it again? Or are there any other changes that I can make? Also, I did not increase the arena size and would really appreciate your help in increasing the default allocation to the tensor arena; this may solve the problem; who knows?

Thanks again,
Garvit

@AIWintermuteAI Such a weird thing I found out while experimenting:

The project even with a 96x96 input size worked with an esp32cam board (AI Thinker) but is not working with other boards that have esp32-s3 chips. Is it something related to the ESP32-S3 chip?

No, 96x96 should run with no issues on S3.
Can you try manually increasing the model arena size? It can be found model_variables.h. Try adding a few hundred kb, if you have PSRAM this much should not be an issue.

1 Like

Hi @AIWintermuteAI

You are correct; increasing a few KBs made it run on the S3 too. But I wonder why this is happening. But now it’s working I don’t mind setting a bit higher manually for the S3. Thanks for the help.

Now, one blunder that I’m noticing now, which has been boggling my mind since yesterday, is that each time I run the example, even with different inputs, I get exactly the same result. I have a binary classifier, and almost every time, I get the exact same value of the impulse, predicting it as one class with 98% confidence. In the studio, it looks like it is working just fine. For your reference, I’m using a binary classifier (greyscale input) quantized model with an EON compiler. I’m using the static_buffer example.

Example result I’m getting:

Predictions (time: 675 ms.):
free:0.984375
occupied:0.015625
run_classifier returned: 0
Timing: DSP 5 ms, inference 675 ms, anomaly 0 ms

Somewhere, I was reading on the forum regarding using a function run_classifier_image_quantized instead of run_classifier when using a quantized model. Could this might be the case? I tried running the former function but couldn’t seem to make it work because of the first argument in the function (impulse), which I could not access in my main file.

Project id: 563549.

Any help/suggestions are much appreciated. Many thanks!

Garvit

Static_buffer runs inference on static data, so you SHOULD be getting the same output? You need to run the camera example to get the image from camera.

re: increasing arena size made it work.

So it is related to the fact that ESP32-S3 uses optimized kernels, which have different arena size requirements due to scratch buffers used in those kernels. We benchmark for regular kernels or CMSIS NN kernels, so we don’t know the exact arena size needed. We do leave some wiggle space, but apparently in that case it was not enough.

Hi @AIWintermuteAI

Thanks for responding. Sorry for the confusion but I meant even when changing the static input inside the static_buffer.ino example, it gives out the same result. It gets the input from the ‘features’ variable that has the processed feature flattened array right? So even if I paste another array there and flash the firmware again, the response is identical, always.

It gets the input from the ‘features’ variable that has the processed feature flattened array right?

Yes, this is correct.

So even if I paste another array there and flash the firmware again, the response is identical, always.

The important thing here is if it matches the classification results from Studio. If it matches, the device inference works correctly and the issue is with the model. Do the results match?

Hi @AIWintermuteAI

Yes, the predictions are correct in the studio; it’s only messed up when I’m running it on an ESP32. The results don’t match at all.

Did you find a solution for this anomaly? I have the same problem with sound classification.

Here in the seeed wiki they have a solution, but not working for me, gives the same mixed results as before.

Hi @garvit185

Did you try solution detailed above? Also pease use our docs for reference or highlighting bugfixes if you can, we dont have control over the SEEED ones and they may be out of sync →

Can you try manually increasing the model arena size? It can be found model_variables.h. Try adding a few hundred kb, if you have PSRAM this much should not be an issue.

Best

Eoin

@garvit185 ,
I tested your project on T-Camera S3 and could not reproduce the results mismatch. After tweaking the arena size, the results from Studio match the results on the device:
image
image

@eduard please create a new forum thread with detailed description (follow the template) and the steps to reproduce.

Hi @AIWintermuteAI & @Eoin

I found what I was doing wrong. In the input features instead of copying raw features I was putting in the processed feature array. It happened because I got confused with the nomenclature, where the input type was float and the raw features were hexadecimal. I thought the input should be the processed feature array as they were of type float.

Once I tried to do it correctly, the IDE showed me errors when I added the raw features.

But thanks to you, from your screenshot, I could figure out what I was doing wrong. Also the other issue is solved as well after increasing the tensor arena size in the esp32-s3 chip.

Thanks guys so much for your support and your patience.

If it is of any help, I would suggest pasting the processed features instead of raw features should throw an error.

Many thanks,
Garvit