Arduino PortentaH7 Lite, PortentaH7 and the Vision Shield

Can some of the Edge Impulse superstars reply ( @janjongboom, @dansitu, @aurel and others … ). I really want to have a bigger than 96x96 pixel Vision model on the Portenta and I am not that interested in accuracy (presently just while I test the memory flexibility Arduino is working on). I would like to trim the layers and wondering which layers you think are most advantageous to reduce model size without killing the basic Machine Learning ability. (As I solved my base question… see further down, lets switch this to which model should I try and what reductions do you suggest.)

Here is an image of the present 96x96 compared to the Vision area (the box is 96x96 skewed a little by the 128x64 Grove OLED)

This is model manipulation area I want to get better at and to make a video about it for my students on my playlist, but I haven’t delved into it yet. Any suggestions for where to start?

A bit confused by the 3 vertical dots. not sure where to find them.

Ok, found that

Wow the model is fairly easy. Think I answered my own question. Either reduce the density (16) of the main layer or pick another model.

image

Probably should not post this as I kind of answered my own question, but maybe someone else is having the same issues. Still need suggestions for other models to try. I noticed a GRAYSCALE model which should probably help.

So with a little playing I found:

Which looks like it’s original size was 224x224, so I tried that and got a much bigger resolution 224x224 working on the Portenta.

The model is far less accurate, than the original 96x96 but from the image you can see the sensing resolution 224x224 is much larger than the original (skewed by the 128x64 pixel screen)

Now just got to wait for Arduino to get memory flexibility working between the cores. Also need to work on simplifying the code.

Hi @Rocksetta! You can actually use all of the transfer learning models with any input size—they are trained and tested on 96x96, but they’ll work with whatever input size you prefer. There may be some impact to accuracy, but often it’s relatively minor.

Warmly,
Dan

@rjames

So I switched from my Monochrome OLED to a 16 color 1.5 inch 128x128 WaveShare greyscale OLED and really like it. Fairly fast and I can see what I am analyzing.

image

@dansitu
This code simply detects a microcontroller and the LED goes blue, it would be green seeing the stapler. The box is the 96x96 classification window, and the code does a “cutout”. See code below


/**
 * This function is called by the classifier to get data
 * We don't want to have a separate copy of the cutout here, so we'll read from the frame buffer dynamically
 */
int cutout_get_data(size_t offset, size_t length, float *out_ptr) {
    // so offset and length naturally operate on the *cutout*, so we need to cut it out from the real framebuffer
    size_t bytes_left = length;
    size_t out_ptr_ix = 0;

    // read byte for byte
    while (bytes_left != 0) {
        // find location of the byte in the cutout
        size_t cutout_row = floor(offset / CUTOUT_COLS);
        size_t cutout_col = offset - (cutout_row * CUTOUT_COLS);

        // then read the value from the real frame buffer
        size_t frame_buffer_row = cutout_row + cutout_row_start;
        size_t frame_buffer_col = cutout_col + cutout_col_start;

        // grab the value and convert to r/g/b
        uint8_t pixel = frame_buffer[(frame_buffer_row * FRAME_BUFFER_COLS) + frame_buffer_col];


        //uint8_t pixel = (pixelTemp>>8) | (pixelTemp<<8);
        //uint8_t pixel = 255-pixelTemp;
        
        uint8_t r = pixel;
        uint8_t g = pixel;
        uint8_t b = pixel;

        // then convert to out_ptr format
        float pixel_f = (r << 16) + (g << 8) + b;
        out_ptr[out_ptr_ix] = pixel_f;

        // and go to the next pixel
        out_ptr_ix++;
        offset++;
        bytes_left--;
    }

    // and done!
    return 0;
}



I know that fancier code can do a “squish” instead of a “cutout” that is what Edge Impulse does when it firsts defines your model. Does anyone know how to do that for Arduino? I hope to have the 320x320 camera activated soon. See Github here With the extra memory size of the Portenta but not an increased heap, I think the 96x96 classification will be fine as long as we use more of the Camera frameBuffer and a Squish might help with that.

Any suggestions or links I can look at?

Hi @Rocksetta that’s awesome!

Yes, instead of cropping in cutout you can take a look at resizeImage() in nano_ble33_sense_camera.ino from our Arduino library export. Note that you can use a single buffer for the input and output buffer.

You can also take a look out our crop_and_interpolate_rgb888() in edge-impulse-sdk/dsp/image/processing.cpp.

The difference between two is the former is our deprecated and ad-hoc function while the latter will consolidate all our image processing functions.

// Raul

1 Like

Thanks so much Raul @rjames. I will look at those. I just tested the Portenta in 320x320 camera view and it crashes even at M7 100:0 M4 so it must use the same heap that the ML model is using. I will put an Issue on Arduino to see if they have any suggestions.

.

Here is the issue I filed

If anyone can add information or star it as important that might help.

@rjames Good luck with all the technical stuff, it is out of my league, I just simplify what already works. If I can help in anyway please let me know. (Please let me test what your working on when it is public)

So I got working:

The Portenta 320x240 camera setting using any Portenta memory split (I used M7 50:50 M4) and the standard 96x96 Edge Impulse model using the cutout technique with the Grayscale 128x128 1.5 inch OLED (Waveshare or Adafruit, I strongly suggest getting this, it is surprisingly fast). My latest sketch shows on screen when a classification match is recorded and counts how many were witnessed.

Code here

(Note the onscreen counter)

What I would like to get working and will need to wait for your version Raul is:

  1. Portenta 320x320 camera
  2. M7 75:25 M4 memory split (or 50:50 if possible)
  3. 1.5 inch 128x128 GrayScale OLED
  4. The newer Edge Impulse resize_image not cutout code
  5. Largest resolution Edge Impulse model possible but fine with 96x96 if that is the best we can do.

Good luck with all the … " you could have a huge heap by just using Portenta_SDRAM library and replacing the calls to malloc and free with ea_malloc and ea_free ." message from Martino!

If you need me to test code just message @Rocksetta or put it in this thread.

Hi @Rocksetta awesome work!!

In light of the new information on the other threads (and on GitHub) have you tried placing the framebuffer on the SDRAM to be able to run larger models? Something like this…

#define EI_CAMERA_RAW_FRAME_BUFFER_COLS           320
#define EI_CAMERA_RAW_FRAME_BUFFER_ROWS           320

#define ALIGN_PTR(p,a)   ((p & (a-1)) ?(((uintptr_t)p + a) & ~(uintptr_t)(a-1)) : p)

static uint8_t *ei_camera_frame_mem;
static uint8_t *ei_camera_frame_buffer; // 32-byte aligned

Do this only once:

   // initialise the entire SDRAM (or choose the amount required)
    SDRAM.begin(SDRAM_START_ADDRESS);

Can be done in a loop:

    ei_camera_frame_mem = (uint8_t *) SDRAM.malloc(EI_CAMERA_RAW_FRAME_BUFFER_COLS * EI_CAMERA_RAW_FRAME_BUFFER_ROWS + 32 /*alignment*/);
    if(ei_camera_frame_mem == NULL) {
        ei_printf("failed to create ei_camera_frame_mem\r\n");
        return false;
    }
    ei_camera_frame_buffer = (uint8_t *)ALIGN_PTR((uintptr_t)ei_camera_frame_mem, 32);

   // do some work

  SDRAM.free(ei_camera_frame_mem);


2 Likes

thanks Raul, Talk Monday. I made a sketch named after you on my github

That was surprisingly easy, yes SDRAM works very well for the Camera 320x320. No speed slowdown. I used my own code which probably has memory leaks.


SDRAMClass mySDRAM;

uint8_t *sdram_frame_buffer;
//uint8_t frame_buffer[320*320];

void setup() {
  //Serial.begin(921600);  
  Serial.begin(115200);  
 
  mySDRAM.begin();
  sdram_frame_buffer = (uint8_t *)mySDRAM.malloc(320 * 320 * sizeof(uint8_t));



then in the main loop I did

if (cam.grab(sdram_frame_buffer) == 0){...

Crashes occasionally probably because nothing is aligned, but working proof is always good.

Question for @rjames, @dansitu, @janjongboom or anyone else that is interested.

With this extra 8 MB SDRAM on the Portenta, could we load multiple models into SDRAM and switch between them?

I did something similar years ago with JavaScript and 2 MINST models here

Presently I don’t know where the Edge Impulse model buffer is formed to be able to switch to SDRAM, but I will look for it this weekend.

Nice when things work. :smiley: 320x320 Portenta Camera classifying 96x96 edge impulse model showing on a 128x128 Grayscale OLED, using M7:M4 50:50

Which could not be run without using SDRAM for the Camera frame buffer!

image

2 Likes

So this is very exciting, using a slightly changed @rjames code above I did get SDRAM working for the Portenta Camera using 320x320 Grayscale, with the 96x96 Edge Impulse model and the 128x128 Grayscale OLED here. Thanks so much Raul for the alignment code snipet, I did not find anything even remotely like it on the web.

The speed that you can move the camera and get detections is amazing for a microcontroller.

I looked into putting the edgeImpulse model into SDRAM but I could not find a main buffer to store. I also looked at TensorflowLight which has much simpler code

unsigned int model_tflite_len = 2640;

const unsigned char model_tflite[] = {...}

but could not figure out how to put the unsigned char array into SDRAM.

Anyway, so far this is all really good news, I just have to figure out how to go from CUTOUT using only the 96x96 pixels to RESIZEIMAGE squishing the full 320x320 images down to 96x96 and I am a bit stuck here. Not sure if I can just replace the cutout function with the resize image function. May need a bit of help with this part.

Once again thanks Raul @rjames

2 Likes

Hello @Rocksetta,

Here is a function that I have been using on the ESP32 Cam: image_resize_linear.
It is a bit tricky to find it on Google but this image_util is doing a pretty good job: https://github.com/espressif/esp-dl/blob/420fc7e219ba98e40a5493b9d4be270db2f2d724/image_util/image_util.c
Feel free to use some of the functions if it can help you.

Regards,

Louis

Thanks so much @louis, that code is fairly understandable, should be a good base, along with the two versions from Edge Impulse above.

void image_resize_linear(uint8_t *dst_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h)

Just checking is int dst_c the number of channels meaning RGB is 3 and GRAYSCALE is 1?

Somehow I logged in differently this is still @rocksetta.

This seems very wrong. @louis, I am struggling with how to get the out_ptr into the features_signal structure. This is what I have tried so far, seems very wasteful of memory. It does compile but flashes red on loading to the Portenta.

  int myCamResult =  myCam.grab(sdram_frame_buffer); // myCamResult should be zero 

    // the features are stored into flash, and we don't want to load everything into RAM
    signal_t features_signal;
    features_signal.total_length = CUTOUT_COLS * CUTOUT_ROWS;
    float model_input_buffer[features_signal.total_length];
   
    //features_signal.get_data = &cutout_get_data;   // this activated the old code

    // somehow activate resize???
    // void image_resize_linear(uint8_t *dst_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h)


    
    uint8_t * tmp_buffer = (uint8_t *) malloc(features_signal.total_length);

    image_resize_linear(  tmp_buffer,  sdram_frame_buffer,  96,96,    1  /* 1 = GRAYSCALE */,    320,320 );
    
// convert int_8 to float
    for (int i=0; i < features_signal.total_length; i++){
       model_input_buffer[i] = tmp_buffer[i] / 255.0f;  
    }

  

    features_signal.get_data((unsigned int)sizeof(uint8_t), (unsigned int)features_signal.total_length, model_input_buffer);
    
    free(tmp_buffer);

Hello @Rocksetta,

I just checked how I did the project using the ESP32 Cam to see if there was any differences, in fact, the way I did allocated the pointer using Espressif SDK was using this function dl_matrix3du_alloc
This will be different with the Portenta.

Also, to answer you on this parameter: int dst_c it is the channel (so 3 for RGB). In your case, I see that you are using grayscale and from what I see in the header file (https://github.com/espressif/esp-dl/blob/420fc7e219ba98e40a5493b9d4be270db2f2d724/image_util/include/image_util.h)

/**
     * @brief Resize the image in RGB888 format via bilinear interpolation
     * 
     * @param dst_image    The output image
     * @param src_image    Source image
     * @param dst_w        Width of the output image
     * @param dst_h        Height of the output image
     * @param dst_c        Channel of the output image
     * @param src_w        Width of the source image
     * @param src_h        Height of the source image
     */
    void image_resize_linear(uint8_t *dst_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h);

It has to be in RGB888 format to use this function… I might have misread your initial question.

If needed, here is an arduino sketch I wrote to show how it works with the ESP32 Cam + Edge Impulse

Regards,

Louis

4 Likes

@Rocksetta,

If you want to only resize (and not crop) from 320x320 to 96x96 you can try this (on the Portenta H7):
I’ve used magic numbers just for verbosity.

cam.grab(ei_camera_frame_buffer);

resizeImage(320, 320, // <-- input buf resolution
            ei_camera_frame_buffer, // <-- input buf
            96, 96, // output buf resolutions
            ei_camera_frame_buffer, // <-- output buf can be the same
            8); // <-- bits per pixel
    ei::signal_t signal;
    signal.total_length = 96 * 96; // <-- in pixels 
    signal.get_data = &ei_camera_cutout_get_data;
static inline void mono_to_rgb(uint8_t mono_data, uint8_t *r, uint8_t *g, uint8_t *b) {
    uint8_t v = mono_data;
    *r = *g = *b = v;
}

int ei_camera_cutout_get_data(size_t offset, size_t length, float *out_ptr) {
    size_t bytes_left = length;
    size_t out_ptr_ix = 0;

    // read byte for byte
    while (bytes_left != 0) {

        // grab the value and convert to r/g/b
        uint8_t pixel = ei_camera_frame_buffer[offset];

        uint8_t r, g, b;
        mono_to_rgb(pixel, &r, &g, &b);

        // then convert to out_ptr format
        float pixel_f = (r << 16) + (g << 8) + b;
        out_ptr[out_ptr_ix] = pixel_f;

        // and go to the next pixel
        out_ptr_ix++;
        offset++;
        bytes_left--;
    }

    // and done!
    return 0;
}

2 Likes

Your amazing @rjames, :100: thanks so much for the code examples. This is the first time I have ever had the Arduino Portenta using all 320x320 camera pixels resized to a 96x96 Edge Impulse model, using the SDRAM of the Portenta for the camera buffer.

Look how fast this little beast is processing.

My version of Raul’s code is here .

1 Like

That’s awesome news! :smiley:

// Raul

1 Like

Raul @rjames, any hints about testing multi-object “bounding boxes object detection” instead of what we have working for single object “one label per data” ?

I know from my work with WASM that the differences between the two types of code are very small. I am wondering if this line still grabs the result correctly



    EI_IMPULSE_ERROR res = run_classifier(&signal, &result, false /* debug */);

Also
the result structure has changed for multi-object, but I can’t find any information about the new structure other than the header file ei_classifier_types.h. I think from my WASM the calls look something like this.


result.results.length 

result.results[i].value

result.results[i].label

result.results[i].x, result.results[i].y, result.results[i].width, result.results[i].height   
    
result.anomaly

but from the header file it is probably more like:

result.bounding_boxes[i].x
result.bounding_boxes[i].y
result.bounding_boxes[i].width
result.bounding_boxes[i].height
result.bounding_boxes[i].value
result.bounding_boxes[i].label

result.timing.sampling
result.timing.dsp
result.timing.classification
result.timing.anomaly



result.anomaly





Any confirmation here. Not expecting bounding boxes to work well, just wondering if it works at all on the Portenta. It was very slow on the web browser.

@Rocksetta this won’t work until the new constrained object detection is live.