LOG_SOFTMAX tflite for MCU support

adamsantamaria · June 7, 2023, 4:02pm

Hello,

It seems that Tensorflow Lite for MCU does not support the LOG_SOFTMAX opecode (LogSoftmax from Pytorch).
Is there a way to define a custom function to replace it ?

Thanks

louis · June 8, 2023, 10:08am

Hello @adamsantamaria,

I can see it here: tflite-micro/log_softmax.h at main · tensorflow/tflite-micro · GitHub

Could you provide some more info on how you’re converting your PyTorch to TFlite?
In parallel, I’m checking with our ML team to see if we need to enable something custom on our end.

Best,

Louis

adamsantamaria · June 8, 2023, 4:13pm

Hello @louis,

Thanks for your answer.
Glad to hear the function exists.

I am exporting my Pytorch model named model to ONNX format this way:

# exporting the model to ONNX format
print("Exporting the model to ONNX format...")
model.eval()
dummy_input = torch.randn(1, 3, 15000)
input_names = ["actual_input"]
output_names = ["output"]
torch.onnx.export(
    model,
    dummy_input,
    f"{MODELS_DIR}/activity.onnx",
    verbose=False,
    input_names=input_names,
    output_names=output_names,
    export_params=True,
)

Then, I use the .onnx generated file as a parameter for the model.profile and model.deploy functions:

try:
    profile = ei.model.profile(
        model=f"{MODELS_DIR}/activity.onnx", device='cortex-m4f-80mhz'
    )
    print(profile.summary())
except Exception as e:
    print(f"Could not profile: {e}")
try:

    deploy_bytes = ei.model.deploy(
        model=f"{MODELS_DIR}/activity.onnx",
        model_output_type=model_output_type,
        deploy_target="zip",
        engine="tflite"
    )
except Exception as e:
    print(f"Could not deploy: {e}")

pip freeze outputs:

tensorflow==2.12.0
edgeimpulse==1.0.4
edgeimpulse-api==1.23.6
torch==1.8.1+cpu
torch-summary==1.4.5
onnx==1.14.0
onnxruntime==1.15.0

Thanks

louis · June 9, 2023, 9:24am

Hello @adamsantamaria,

Just had a deeper look this morning, could you try to set the opset_version=9, (or higher, I think latest is 18) in your torch.onnx.export function?

It seems that it is support starting from that version: pytorch/symbolic_opset9.py at main · pytorch/pytorch · GitHub

This GH issue gave me the indication: RuntimeError: ONNX export failed: Couldn't export operator aten::softmax · Issue #20643 · pytorch/pytorch · GitHub.
I have not had time to reproduce myself though.

Best,

Louis

adamsantamaria · June 12, 2023, 7:53am

Hi Louis,

Changing the opset_version did not solve my problem (default value is 14).
When I setup the verbose mode of onnx.export,

torch.onnx.export(
    model,
    dummy_input,
    f"{MODELS_DIR}/activity.onnx",
    verbose=True,
    input_names=input_names,
    output_names=output_names,
    export_params=True,
    opset_version=12,
)

I don’t get any message about not finding the opecode log_softmax:

%output : Float(1, 6, strides=[6, 1], requires_grad=1, device=cpu) = onnx::LogSoftmax[axis=1](%19) # /home/asantamaria/.pyenv/versions/activity-3.8.16/lib/python3.8/site-packages/torch/nn/functional.py:1672:0

The problem does not seem to come from onnx.

louis · June 12, 2023, 8:52am

Hi Adam,

Could you share your model and/or your source code with some data samples so I can try to replicate your issue please.

Also, I’ve seen that workaround:

Could you replace nn.LogSoftmax(dim=1) with the following code,and retraining the network.

import torch.nn.functional as F
F.log_softmax(input,1)

In parallel, I’ll check if the issue comes from how we convert the onnx to tflite when you import your onnx model.

Best,

Louis

adamsantamaria · June 12, 2023, 1:02pm

Thanks Louis for your help,

I can’t share my full model/code with you for now, but I have setup a mock code to replicate the issue:

import edgeimpulse as ei
import torch
import torch.nn as nn

# define a model featuring only one logsoftmax layer (input_size=2*3)
input = torch.randn(2, 3)
m = nn.LogSoftmax(dim=1)

# check input/output relation
print(input)
print(m(input))

# exporting the model to ONNX format
print("Exporting the model to ONNX format...")
m.eval()
dummy_input = torch.randn(1, 2, 3)
input_names = ["actual_input"]
output_names = ["output"]
torch.onnx.export(
    m,
    dummy_input,
    "activity.onnx",
    verbose=True,
    input_names=input_names,
    output_names=output_names,
    export_params=True,
    opset_version=11,
)

# edgeimpulse API key
ei.API_KEY = "XXX"

# estimate the RAM, ROM, and inference time for our model on the target hardware family
try:
    profile = ei.model.profile(
        model="activity.onnx", device='cortex-m4f-80mhz'
    )
    print(profile.summary())
except Exception as e:
    print(f"Could not profile: {e}")

You can try it by using your API key.
From my side I still get the message Unsupported ops: LOG_SOFTMAX in the model.profile output:

Target results for float32:
===========================
{
    "device": "cortex-m4f-80mhz",
    "tfliteFileSizeBytes": 1112,
    "isSupportedOnMcu": false,
    "timePerInferenceMs": 1,
    "mcuSupportError": "Unsupported ops: LOG_SOFTMAX."
}

Also, I can’t use F.log_softmax(input,1) as a model.

Thx !

adamsantamaria · June 12, 2023, 2:31pm

I observe the same thing with a Keras model.
Thus the problem should not come from ONNX.

import edgeimpulse as ei
import torch
import torch.nn as nn

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# define a model featuring only one logsoftmax layer (input_size=2*3)
pt_tensor = torch.randn(2, 3)
np_tensor = pt_tensor.numpy()
tf_tensor = tf.convert_to_tensor(np_tensor)

pt_model = nn.LogSoftmax(dim=1)
tf_model = keras.Sequential([layers.Activation('log_softmax')])

# check input/output relation
print(pt_tensor)
print(pt_model(pt_tensor))
print(tf_tensor)
print(tf_model(tf_tensor))

# exporting the model to ONNX format
print("Exporting the model to ONNX format...")
pt_model.eval()
dummy_input = torch.randn(1, 2, 3)
input_names = ["actual_input"]
output_names = ["output"]
torch.onnx.export(
    pt_model,
    dummy_input,
    "activity_pt.onnx",
    verbose=True,
    input_names=input_names,
    output_names=output_names,
    export_params=True,
    opset_version=11,
)

# edgeimpulse API key
ei.API_KEY = "XXX"

# estimate the RAM, ROM, and inference time for our model on the target hardware family
try:
    profile = ei.model.profile(
        model="activity_pt.onnx", device='cortex-m4f-80mhz'
    )
    print(profile.summary())
except Exception as e:
    print(f"Could not profile: {e}")

try:
    profile = ei.model.profile(
        model=tf_model, device='cortex-m4f-80mhz'
    )
    print(profile.summary())
except Exception as e:
    print(f"Could not profile: {e}")

aurel · June 12, 2023, 3:52pm

Hi @adamsantamaria,

Indeed the op has not been ported to our internal codebase yet.
It will be added by end of this week, we’ll keep you posted!

Aurelien

louis · June 23, 2023, 7:31am

Hello @adamsantamaria,

@janjongboom added the log softmax op it passed all the tests, it should be merged today.
I’ll test with your code sample when it will be ready.

Best,

Louis

louis · June 23, 2023, 12:51pm

Hello @adamsantamaria,

The Log Softmax op is now available.
I just managed to profile your “activity.onnx” test model:

Target results for float32:
===========================
{
    "device": "cortex-m4f-80mhz",
    "tfliteFileSizeBytes": 1112,
    "isSupportedOnMcu": true,
    "memory": {
        "tflite": {
            "ram": 2132,
            "rom": 24360,
            "arenaSize": 1964
        },
        "eon": {
            "ram": 760,
            "rom": 11824
        }
    },
    "timePerInferenceMs": 1
}


Performance on device types:
============================
{
    "variant": "float32",
    "lowEndMcu": {
        "description": "Estimate for a Cortex-M0+ or similar, running at 40MHz",
        "timePerInferenceMs": 24,
        "memory": {
            "tflite": {
                "ram": 2132,
                "rom": 24360
            },
            "eon": {
                "ram": 760,
                "rom": 11824
            }
        },
        "supported": true
    },
    "highEndMcu": {
        "description": "Estimate for a Cortex-M7 or other high-end MCU/DSP, running at 240MHz",
        "timePerInferenceMs": 2,
        "memory": {
            "tflite": {
                "ram": 2132,
                "rom": 24360
            },
            "eon": {
                "ram": 760,
                "rom": 11824
            }
        },
        "supported": true
    },
    "highEndMcuPlusAccelerator": {
        "description": "Most accelerators only accelerate quantized models.",
        "timePerInferenceMs": 2,
        "memory": {
            "tflite": {
                "ram": 2132,
                "rom": 24360
            },
            "eon": {
                "ram": 760,
                "rom": 11824
            }
        },
        "supported": true
    },
    "mpu": {
        "description": "Estimate for a Cortex-A72, x86 or other mid-range microprocessor running at 1.5GHz",
        "timePerInferenceMs": 1,
        "rom": 1112.0,
        "supported": true
    },
    "gpuOrMpuAccelerator": {
        "description": "Estimate for a GPU or high-end neural network accelerator",
        "timePerInferenceMs": 1,
        "rom": 1112.0,
        "supported": true
    }
}
None

Best,

Louis

adamsantamaria · June 23, 2023, 3:30pm

Hi @louis,

It is also working on my side
I will be able to get forward.
Thanks for your help.

Adam