Incorporating model into another C++ project (Linux)

jarnold1 · November 11, 2022, 4:37pm

Hi, I am using a combination of different sensors as input data to a model that is meant to run very quickly (within 1ms). The target device is a BeagleBone Black microcomputer running Debian GNU/Linux 10 (buster). There is already a large codebase (C++) that reads in sensor data and uses that sensor data to control a robot, and I’m looking for feedback or examples of the best way for me to incorporate a model trained on the Edge Impulse website into this project.

The pipeline for collecting and logging sensor data already exists on C++ codebase on the BeagleBone, so I used that pipeline and then uploaded the datafiles to the the Edge Impulse studio and trained a model, which in Edge Impulse Studio shows the inference can run at 1ms (my target speed). The next few integration steps I did I’m not sure if they are the ideal ones for implemented the model into my project on the BeagleBone to test in realtime:

I exported the model to a C++ library. However, I also looked into selecting the “Linux Board” option, but wasn’t sure. Which of these options is recommended given my hardware and goal of implementing the model into a pre-existing codebase?
I then compiled the C++ library on the BeagleBone, following the steps here: As a generic C++ library - Edge Impulse Documentation. I slightly modified this so that I can read in a data file with the input data and then the compiled app outputs the model results. One thing that is not ideal about this step is that the compilation is very long. I can see that many of the files included in the library are ML operations that I don’t need for my model (e.g., depthwise conv takes a long to compile, but I’m not using it). Is there a way to speed this up, or a different method of compiling it?
Finally I call the compiled app from the pre-existing C++ codebase. The aspect of this implementation that I like is that I do not need to recompile the model with the C++ codebase. I can test the model by itself with data I’ve collected before, or I can run it with real-time data. However, I feel that by calling the compiled model and reading the output, I may be adding additional time needed for each inference. Is there a better way to integrate the model into the C++ codebase?

Any feedback on my implementation you can provide would be greatly appreciated. Currently, the model takes about 500ms per inference when deployed on the BeagleBone. The .eim model seems like an alternative, but I’m not sure where to start or if it will solve the issues I’ve described above.

jarnold1 · November 14, 2022, 9:55pm

To add some extra details: Some overhead I assume is related to me calling the executable model from another C++ project. However, when using the time command in Linux, I can see that the prediction are still well over 1ms for a single inference.

debian@beaglebone:~/ time ./build/app
Predictions:
...

real 0m0.077s
user 0m0.012s
sys 0m0.011s

shawn_edgeimpulse · November 15, 2022, 5:28pm

Hi @jarnold1,

The “Linux Board” deployment option simply tells you to use the CLI tools to download the trained .eim model file to your board. The .eim file is a the preprocessing/model steps pre-compiled for your architecture. One benefit of using the .eim file is that it can run as a “runner” in a separate process from your main code. This allows you to multi-process your application to collect data and perform inference. A C++ example with the .eim file can be found here: example-standalone-inferencing-linux/eim.cpp at master · edgeimpulse/example-standalone-inferencing-linux · GitHub.
Compilation is lengthy because of how big the Edge Impulse library is. Unfortunately, there’s no good way to reduce the size of that SDK library. However, you can speed up compilation by doing a multi-threaded make with the command “make -j” or you can look into cross-compiling your application on a full computer before transferring it to the BeagleBone.
If you want to perform inference with your model, I don’t think there’s anyway around calling your model and reading the output (e.g. with the run_classifier() function). There’s some slight overhead to performing a function call, but the code to perform inference is quite efficient. The other option is to look into using the .eim file as a runner to perform inference in a separate process. Note that inference is still single-threaded, so it won’t necessarily offer any speed-up over calling it in-line with your C++ code.

Could you give some information about what you’re trying to do? 500ms per inference on a BeagleBone seems like a lot, but that might be normal if you’re, say, doing object detection with MobileNet-SSD. I only get 1-2 fps with MobileNet-SSD on a Raspberry Pi, so I would expect similar on a BeagleBone. If you provide your project ID, I can take a look at your dataset and model to hopefully offer some insights as to why inference might be slow.

jarnold1 · November 18, 2022, 4:23pm

Hi @shawn_edgeimpulse, thank you for the detailed response!

I got the model to run much faster ~1-10ms per inference by incorporating my pre-existing C++ codebase into the standalone inference example project (rather than when I previously compiled a standalone inference executable and called that from the pre-existing C++ codebase). Most of the inference latency was likely a result of me capturing the std::cout of the model executable rather than integrating it into the pre-existing C++ codebase.

The most difficult part was modifying the Makefile to compile the Edge Impulse generated code and the pre-existing C++ codebase, since the codebase I have is a multi-threaded application written in C++11. Once I got this, it compiled for a long time initially (when compiling the edge-impulse-sdk), but luckily this is only an issue for the first compilation (so I can still modify files not related to the model inference).

Some new questions:

When I create a new model (say I want to change the number of input features), do I only need to replace the model-parameters and tflite-model directories? I would prefer to not need to recompile edge-impulse-sdk each time I iterate on the model. Another way to ask the question: each time C++ code is exported from Edge Impulse studio, what files are always the same? Do some change based on the hardware you select during training?
If I wanted to implement two completely different models (different input sensor data, different output inference classes), how would you recommend doing that? My impression is that some global variables (e.g., EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE) would break if I had multiple sets of model-parameters and tflite-model directories.
How do I use run_classifier_continuous()? Write now I define a std::queue<float> that stores the most recent data (and removes the old frames), and pass this to run_classifier(). My impression is run_classifier_continuous() would be simpler but I can’t find an example of how to use it.

shawn_edgeimpulse · November 22, 2022, 2:57pm

Hi @jarnold1,

It’s highly recommended that you replace all 3 directories in the SDK .zip download, as code can change. I don’t know exactly which files change. You’re welcome to try replacing just those 2 directories, but note we don’t recommend it. First-time compilation of just the Edge Impulse SDK should take ~30 seconds on a modern machine if you use the ‘-j’ flag with make. This may be limited by other cross-compilers, however.
We do not support 2 models in one project at the moment. As you have noted, some of our parameters in that file would need to change.
Please see our API documentation for using run_classifier_continuous(): run_classifier_continuous() - Edge Impulse API. Note there is a link to an example at the bottom of the page.

MMarcial · November 22, 2022, 10:51pm

@shawn_edgeimpulse I found that when working with https://github.com/edgeimpulse/firmware-sony-spresense I had to do a merge copy. Else the app would not compile.

shawn_edgeimpulse · November 29, 2022, 3:59pm

@MMarcial Thanks for pointing that out! I do not have much experience with the Spresense.