ESP32 CAM Support

louis · April 2, 2021, 4:30pm

I’ve managed to make the tutorial work on my ESP32 AI thinker board.
It should also work on the Wrover module. I had to fix few things on the code example provided by the tutorial.

I just created a Github repo with my code, could you try that to see if it works please? https://github.com/luisomoreau/ESP32-Cam-Edge-Impulse

Best,

Louis

Nenamd · April 3, 2021, 1:08pm

hay @louis
woooow,
I have tried your code, and it works fine.

Thank you
nenamd

Nenamd · April 4, 2021, 10:43am

Hello @aurel
I want to ask again.
I’ve tried with frimezise above 400x296 and the results failed.
is it possible to use framesize above 400X296?
If possible, what should I do?

Thanks
Nena

jifly · April 4, 2021, 3:57pm

hi @louis
Thank you for your sharing. I tried the code you provided on my own edge impulse model and it works. But no matter what kind of pictures I take, the output is the same every time [Predictions Scores 0.99609, 0.00391]. Is something wrong? I use a QVGA image format.
thank you
Ji

jifly · April 5, 2021, 3:14pm

Hi @janjongboom @louis
Update today’s new questions.
I have set up a flower classification task according to the tutorial. There are two problems still bothering me.

1.FRAMESIZE_ 240x240 is not declared, although I added it to the header file of <sensor.h>. So I had to use the image size of QVGA (320x240). How can I get a 240x240 image size?
2. No matter what kind of photos are taken, only the scores of two categories are changing. As shown in the figure below, the scores of dandelion & unknown are always zero. But when I save the photos and import them into the edge impulse project as testing dataset, there are scores for all four categories. That’s the real situation. So does anyone know what’s wrong?

I am very grateful if anyone can help me.

louis · April 6, 2021, 11:19am

Hey @Nenamd, @jifly

Which board are you using? I’m using the AI thinker module and the FRAMESIZE_240x240 seems to be working well. Unfortunately, I don’t have a Wrover module with me so I cannot test it right away.
And which camera model are you using?

The QVGA framesize should not be a problem, however, you need to modify these lines (line 29) with the actual frame size you are using:

// raw frame buffer from the camera
#define FRAME_BUFFER_COLS           240
#define FRAME_BUFFER_ROWS           240

Note that the int cutout_get_data(size_t offset, size_t length, float *out_ptr) function does not do a resize. Have a look at the piece of code @janjongboom wrote some times ago for a better understanding.

2. No matter what kind of photos are taken, only the scores of two categories are changing. As shown in the figure below, the scores of dandelion & unknown are always zero. But when I save the photos and import them into the edge impulse project as testing dataset, there are scores for all four categories.

I’m investigating that last point, I had the same issue.
Maybe it comes from the config.pixel_format = PIXFORMAT_JPEG; as the cutout_get_data function is then converting r565 to rgb when we’re using a JPEG format.

I’ll let you know as soon as I have something new.

Best,

Louis

janjongboom · April 6, 2021, 11:26am

Note that we’re adding a fast and memory efficient crop/resize function to the SDK this week, so we can switch to that soon.

jifly · April 6, 2021, 2:56pm

hi @louis
thank you for your reply. I’m also using the AI thinker module. I set framesize to 320x240 according to the QVGA format，so that’s not the problem，I just wonder why FRAMESIZE_240x240 doesn’t work.

I’ve referred to the function cutout_get_datain() in detail @janjongboom. I think this function is to convert the input rgb565 format into rgb888 format, and cut the image to the required size, such as 48x48.But I noticed a problem: The image of rgb565 is in U16 format, while the image buf we put in is in U8 format, which does not match in quantity. For example, if we get a 240x240 image, the size of fb ->buf should be 240 * 240 * 2 = 115200(if
we use config.pixel_format = PIXFORMAT_rgb565). We should first combine every two bufs into an rgb565 unit, and then input it into the cutout_get_data() function.

if we use config.pixel_format = PIXFORMAT_JPEG,Should the compressed buf be restored to RGB format before processing? Anyway, I think this function needs to add some other parts to work properly.

Thanks again and look forward to new solutions.
Jifly

jifly · April 6, 2021, 3:04pm

hey @janjongboom
Thanks for sharing this exciting news. I’d like to ask if there are different ways of image crop/resize during training. Do I need to consider these ways when programming, and provide reasoning format according to the specified way?

Jifly

louis · April 8, 2021, 8:10am

Hey @jifly,

For the FRAMESIZE_240X240, can you make sure it is declared in both sensor.h and driver/sensor.c (please see this commit https://github.com/espressif/esp32-camera/commit/9151be55eefdbc7bb18651d57da1fea34ec69ef7 that has been merged in dec 2019 to support this size).

Also, I just had an interesting reading on the ESP32 forum: https://www.esp32.com/viewtopic.php?t=17479
I haven’t had time to test the author’s different remarks though with all the different frame formats.
I’ll come back to you when I have tested and gone further.

Best,

jifly · April 8, 2021, 10:02am

Hi @louis

thank u for ur time. I will check it. pls let me know if there are any updates.

Best,
Jifly

Nenamd · April 15, 2021, 4:31am

Hi @louis
I want to ask about the NN architecture in this image.
can you explain step by step on the architecture?
I still don’t understand the step by step architecture in this.

IMG_20210415_112711_586

thank you
Nena

louis · April 16, 2021, 9:26am

Hey @Nenamd,

Is this question related to ESP32 CAM?
Maybe a dedicated question on the forum would be best for the everyone who have the same question

But to answer your question:

The input layer takes your signal as an entry point (the features in the static_buffer.ino example)
It then pass the features to a 2D convolution and a pooling layer (we use by default the MaxPooling2D method but you can change this when switching to the expert mode).
You can have a look at this explanation to understand what the 2D Convolution is doing: https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/blocks/2d-convolution-block

2d_convolution_pa37464×1624 509 KB

In our case, the number of neurons being the filters.
The output of the first 2D Conv / MaxPool2D is then passed to a second one.
The output of the second 2D Conv / MaxPool2D goes to a flatten layer:
https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/blocks/flatten

flatten_pa18001×4500 197 KB
And finally, we apply a dropout to avoid overfitting:
https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/blocks/dropout

I hope this will help to understand better the architecture

Best,

Louis

louis · April 16, 2021, 3:03pm

@jifly

I just created a repo under Edge Impulse Github organisation with both the basic example you tested and a more advanced one (including a web ui to display the inference results):
Please find it here: https://github.com/edgeimpulse/example-esp32-cam

Note that I have been using the official resize linear function from Espressif SDK in the advanced example.

Best,

Louis

jifly · April 16, 2021, 3:52pm

Hello @louis
Thx a lot. It looks like great.

BTW, the model inference outputs the same score every time and every different photos. Is this problem solved?

Best,

Jifly

louis · April 16, 2021, 3:58pm

Yes it should be solved.

Note that when the output is 0.99609 it should be 1, I haven’t figure out why though.
I’ve also noticed when using a “random” or “unknown” class, the results are a bit biased.
I will try to ingest some picture directly taken from the esp32 camera to make sure I train my model with the same “sensor” than the one I will use for the inference.

Best,

Louis

janjongboom · April 19, 2021, 9:12am

Re: the 0.99609 it’s because the output layer of the network is quantized, so the highest value is 255 and the quantization scale is 0.00390625 and 255*0.00390625=0.99609.

jifly · April 19, 2021, 9:44am

hi @janjongboom
well, I understand that. But my question is that no matter what kind of photos I take, the output is the same. It seems that the model does not work correctly as in edge impulse platform. So I don’t know where is the problem.

“No matter what kind of photos are taken, only the scores of two categories are changing. As shown in the figure below, the scores of dandelion & unknown are always zero. But when I save the photos and import them into the edge impulse project as testing dataset, there are scores for all four categories.”

Best,
Jifly

louis · April 19, 2021, 10:09am

Hi @jifly,

Is it still the case with the advanced example ?
I could see the score varying in all categories.

Best,

Louis

jifly · April 19, 2021, 10:30am

hi @louis
I haven’t had time to check the latest demo. I’ll check it and update the results soon.
thank you for your great support.

Best,
Jifly