SOLVED: How to run inference with TFLITE model file from Python (Win10)?

ThomasVikstrom · February 13, 2022, 6:07pm

As this is more about Python and less about Edge Impulse, this forum is perhaps not the correct place to ask this, but if anyone has some pointers it’d be great!
I have a .tflite model file, of course created with EI, and now I’d like to use that model for inferencing, this from inside Python on a Windows computer. There’s a Linux SDK, but that doesn’t help, I guess.
Every sample has 20 float values between -2 and +2.

OmarShrit · February 13, 2022, 6:24pm

Can you provide more information of what is the model is doing?

ThomasVikstrom · February 13, 2022, 6:34pm

Sure. I have 20 sensors which produces values between -2 and +2, and the frequency is 10 Hz.
The sample files I have uploaded have 20 rows (and 20 columns), that is 2 seconds worth of data for each file. I have only two classes so far, open and closed.
I’ve been able to successfully train a model in EI, and now would like to use it “offline” from WIndows/Python by the .tflite model that can be downloaded from Dashboard. Once I get this working, I’ll create a lot more of training data.

ThomasVikstrom · February 14, 2022, 2:06pm

Ok, I’m not there yet, but perhaps on correct path. Seems that by installing Anaconda and Tensorflow, also Tensorflow lite should be available, was in the end able to run the webcam detection from this tutorial.
Then stumbled on this tutorial (that @janjongboom had referred to earlier). This is not as overwhelming as the first, and now I just need to understand how to get my 20 x 20 input values, and the 2 labels, into below program, instead of the image handling.

Any pointers are appreciated, I’ll continue to investigate, and update this post, until the problem is resolved (or god forbid, until I’ve given up…)

import numpy as np
import tensorflow as tf
import cv2
import pathlib

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="detect.tflite")
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.allocate_tensors()

# input details
print(input_details)

# output details
print(output_details)

folder_path = "pictures"
for file in pathlib.Path(folder_path).iterdir():
    # read and resize the image
    img = cv2.imread(r"{}".format(file.resolve()))
    new_img = cv2.resize(img, (224, 224))

    # input_details[0]['index'] = the index which accepts the input
    interpreter.set_tensor(input_details[0]['index'], [new_img])

    # run the inference
    interpreter.invoke()

    # output_details[0]['index'] = the index which provides the input
    output_data = interpreter.get_tensor(output_details[0]['index'])

    print("For file {}, the output is {}".format(file.stem, output_data))

shawn_edgeimpulse · February 14, 2022, 8:40pm

Hi @ThomasVikstrom,

As you pointed out, you can modify the example you found to perform inference on your data. I hope you don’t mind, but I downloaded the model from your “Muse Wave 01” project to test this. You can copy the processed features (which match the raw features, as you are using a “Raw” processing block) into a Python program to test inference. I tested this on Windows 10:

import numpy as np
import tensorflow as tf

# Location of tflite model file
model_path = "ei-muse-wave-01-nn-classifier-tensorflow-lite-float32-model.lite"

# Processed features (copy from Edge Impulse Project)
features = [0.7927, -0.2803, 0.2426, 1.1957, 0.1832, -0.4312, -0.2888, 0.6316, 0.0266, -0.2375, -0.4766, -0.0732, 0.0815, -0.6838, -0.5249, 0.3224, -0.6926, -0.7769, -0.8965, -0.8540, 0.7249, -0.3096, 0.2409, 1.1851, 0.1547, -0.4215, -0.3107, 0.6207, 0.0885, -0.1691, -0.4577, -0.0754, 0.0534, -0.6540, -0.5097, 0.3051, -0.7094, -0.7743, -0.8956, -0.8406, 0.6380, -0.3140, 0.2326, 1.1787, 0.1081, -0.4135, -0.3519, 0.5960, 0.1306, -0.1214, -0.4456, -0.0777, 0.0063, -0.6319, -0.4972, 0.2696, -0.7138, -0.7731, -0.8930, -0.8294, 0.5294, -0.2822, 0.2082, 1.1693, 0.0404, -0.4057, -0.4064, 0.5531, 0.1525, -0.0961, -0.4384, -0.0848, -0.0501, -0.6189, -0.4968, 0.2133, -0.7066, -0.7772, -0.8917, -0.8165, 0.3520, -0.2851, 0.1340, 1.1502, -0.0344, -0.3974, -0.4610, 0.4919, 0.1604, -0.0902, -0.4364, -0.1004, -0.0968, -0.6173, -0.5120, 0.1438, -0.6900, -0.7871, -0.8955, -0.8042, 0.2457, -0.3094, 0.0443, 1.1325, -0.0898, -0.3894, -0.5015, 0.4208, 0.1580, -0.1000, -0.4409, -0.1212, -0.1253, -0.6240, -0.5454, 0.0756, -0.6656, -0.7996, -0.9053, -0.7973, 0.2175, -0.2905, -0.0239, 1.1209, -0.1109, -0.3830, -0.5196, 0.3532, 0.1524, -0.1202, -0.4534, -0.1520, -0.1455, -0.6270, -0.5966, 0.0196, -0.6381, -0.8116, -0.9167, -0.7958, 0.1991, -0.2804, -0.0529, 1.1106, -0.1081, -0.3768, -0.5180, 0.2923, 0.1434, -0.1453, -0.4739, -0.1957, -0.1626, -0.6118, -0.6559, -0.0194, -0.6152, -0.8217, -0.9228, -0.7925, 0.1948, -0.2785, -0.0864, 1.1034, -0.0939, -0.3669, -0.5078, 0.2277, 0.1353, -0.1672, -0.4974, -0.2432, -0.1769, -0.5757, -0.7056, -0.0443, -0.6037, -0.8284, -0.9242, -0.7767, 0.1782, -0.2688, -0.0982, 1.1001, -0.0716, -0.3514, -0.4960, 0.1542, 0.1336, -0.1817, -0.5105, -0.2826, -0.1907, -0.5309, -0.7323, -0.0590, -0.6065, -0.8318, -0.9272, -0.7494, 0.1489, -0.2637, -0.0982, 1.0788, -0.0448, -0.3334, -0.4848, 0.0871, 0.1384, -0.1879, -0.4957, -0.3076, -0.2094, -0.4918, -0.7364, -0.0665, -0.6210, -0.8341, -0.9331, -0.7208, 0.1247, -0.2630, -0.0982, 1.0014, -0.0200, -0.3183, -0.4736, 0.0488, 0.1439, -0.1902, -0.4441, -0.2913, -0.2350, -0.4669, -0.7218, -0.0709, -0.6414, -0.8378, -0.9357, -0.7029, 0.1172, -0.2147, -0.0921, 0.8723, 0.0008, -0.3094, -0.4611, 0.0385, 0.1478, -0.1973, -0.3842, -0.2442, -0.2646, -0.4552, -0.6964, -0.0780, -0.6599, -0.8401, -0.9314, -0.7011, 0.1248, -0.1633, -0.0805, 0.7075, 0.0224, -0.3058, -0.4455, 0.0359, 0.1445, -0.2194, -0.3401, -0.1731, -0.2970, -0.4510, -0.6736, -0.0922, -0.6690, -0.8359, -0.9239, -0.7133, 0.1301, -0.1380, -0.0599, 0.5064, 0.0474, -0.3031, -0.4212, 0.0294, 0.1209, -0.2688, -0.3146, -0.1080, -0.3367, -0.4502, -0.6671, -0.1120, -0.6654, -0.8259, -0.9181, -0.7311, 0.1286, -0.1361, -0.0493, 0.3510, 0.0708, -0.2980, -0.3822, 0.0236, 0.0711, -0.3499, -0.3035, -0.0619, -0.3839, -0.4523, -0.6830, -0.1272, -0.6514, -0.8165, -0.9138, -0.7391, 0.1140, -0.1361, -0.0493, 0.2939, 0.0851, -0.2896, -0.3332, 0.0284, 0.0155, -0.4486, -0.2947, -0.0299, -0.4292, -0.4599, -0.7133, -0.1277, -0.6322, -0.8112, -0.9075, -0.7243, 0.1022, -0.1361, -0.0493, 0.2857, 0.0880, -0.2809, -0.2873, 0.0461, -0.0260, -0.5348, -0.2836, -0.0017, -0.4582, -0.4744, -0.7425, -0.1160, -0.6136, -0.8051, -0.8995, -0.6919, 0.1040, -0.1361, -0.0546, 0.2885, 0.0837, -0.2753, -0.2545, 0.0695, -0.0421, -0.5810, -0.2682, 0.0266, -0.4637, -0.4907, -0.7573, -0.1050, -0.6001, -0.7886, -0.8898, -0.6587, 0.1121, -0.1361, -0.0602, 0.2885, 0.0769, -0.2745, -0.2365, 0.0910, -0.0383, -0.5919, -0.2536, 0.0590, -0.4511, -0.5011, -0.7502, -0.1062, -0.5957, -0.7583, -0.8743, -0.6345]

# Convert the feature list to a NumPy array of type float32
np_features = np.array(features, dtype=np.float32)

# Add dimension to input sample (TFLite model expects (# samples, data))
np_features = np.expand_dims(np_features, axis=0)

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=model_path)

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Allocate tensors
interpreter.allocate_tensors()

# Print the input and output details of the model
print(input_details)
print(output_details)

# Create input tensor out of raw features
interpreter.set_tensor(input_details[0]['index'], np_features)

# Run inference
interpreter.invoke()

# output_details[0]['index'] = the index which provides the input
output_data = interpreter.get_tensor(output_details[0]['index'])

# Print the results of inference
print("Inference output is {}".format(output_data))

Hope that helps!

ThomasVikstrom · February 14, 2022, 9:38pm

Great, thx @shawn_edgeimpulse ! Your program looks similar to what I’ve been able to accomplish, but yours is much simpler (in a positive way).
I was just 20 minutes ago able to get my program to run inference on the same batch of files used to train the project you mentioned. Btw, no problem at all that you used it as a reference!

While not needed right now, how would you use the quantized (int8) model? I guess the features needs to be converted to -127 - 127 or 0 - 255? And of course the proper .lite model file used.

shawn_edgeimpulse · February 15, 2022, 2:21am

Hi @ThomasVikstrom,

Yes, the int8 quantized model expects values in the [-128, 127] range as input and will give you prediction values in the same range. Quantization is accomplished by looking at the range of expected input and output values to determine a scale value and a zero point value. See these two articles for more reference:

You must use those values to transform your floating point input to int8 values and your int8 prediction results back to floating point. You can use something like Netron to examine the quantization scale and zero point values for the inputs and outputs.

As those values are embedded in the quantized model, we can update our inference code to look for them and scale the input/output accordingly:

import numpy as np
import tensorflow as tf

# Location of tflite model file (float32)
#model_path = "ei-muse-wave-01-nn-classifier-tensorflow-lite-float32-model.lite"

# Location of tflite model file (int8 quantized)
model_path = "ei-muse-wave-01-nn-classifier-tensorflow-lite-int8-quantized-model.lite"

# Processed features (copy from Edge Impulse Project)
features = [0.5045, 0.2956, 0.1463, 0.8474, -0.1147, -0.3830, -0.3129, 0.2348, -0.0390, 0.4030, 0.1532, -0.2150, 0.0635, 0.2356, -0.1113, 0.2237, -0.5243, -0.7554, -0.9192, -0.6450, 0.4948, 0.2977, 0.1473, 0.8474, -0.1055, -0.3648, -0.3012, 0.2739, -0.0144, 0.4211, 0.1704, -0.1891, 0.0744, 0.2471, -0.1042, 0.2698, -0.5040, -0.7547, -0.9104, -0.6222, 0.4696, 0.3029, 0.1517, 0.8474, -0.0993, -0.3501, -0.2949, 0.3180, 0.0208, 0.4254, 0.1764, -0.1423, 0.0927, 0.2533, -0.0976, 0.3256, -0.4856, -0.7511, -0.8942, -0.6173, 0.4238, 0.3078, 0.1517, 0.8474, -0.1008, -0.3425, -0.2917, 0.3689, 0.0510, 0.4259, 0.1789, -0.0829, 0.1318, 0.2620, -0.0895, 0.3957, -0.4822, -0.7501, -0.8798, -0.6396, 0.4238, 0.3078, 0.1517, 0.8474, -0.1008, -0.3406, -0.2921, 0.3689, 0.0510, 0.4261, 0.1830, -0.0829, 0.1318, 0.2758, -0.0807, 0.3957, -0.4822, -0.7556, -0.8758, -0.6396, 0.4238, 0.3078, 0.1517, 0.8474, -0.1008, -0.3414, -0.2980, 0.3689, 0.0510, 0.4267, 0.1888, -0.0829, 0.1318, 0.2919, -0.0742, 0.3957, -0.4822, -0.7649, -0.8862, -0.6396, 0.4238, 0.3078, 0.1511, 0.8474, -0.1008, -0.3437, -0.3081, 0.3689, 0.0510, 0.4278, 0.1939, -0.0829, 0.1318, 0.3053, -0.0731, 0.3957, -0.4822, -0.7718, -0.9075, -0.6396, 0.4238, 0.3065, 0.1385, 0.8474, -0.1008, -0.3480, -0.3187, 0.3689, 0.0510, 0.4292, 0.1959, -0.0829, 0.1318, 0.3122, -0.0823, 0.3957, -0.4822, -0.7722, -0.9312, -0.6396, 0.0617, 0.2992, 0.1236, 0.0446, 0.0119, -0.3562, -0.3281, -0.0090, -0.0712, 0.4296, 0.1938, -0.3609, 0.1663, 0.3123, -0.1057, 0.1115, -0.8618, -0.7682, -0.9502, -1.0010, 0.0891, 0.2812, 0.1063, 0.1000, 0.0358, -0.3719, -0.3392, -0.0077, -0.0478, 0.4259, 0.1862, -0.4176, 0.1549, 0.3050, -0.1472, 0.1326, -0.8235, -0.7657, -0.9639, -1.0240, 0.0899, 0.2636, 0.0704, 0.1569, 0.0312, -0.3964, -0.3573, -0.0256, -0.0340, 0.4059, 0.1671, -0.4429, 0.1345, 0.2822, -0.2071, 0.1630, -0.7963, -0.7687, -0.9769, -1.0332, 0.0901, 0.2354, 0.0330, 0.1852, 0.0075, -0.4251, -0.3869, -0.0554, -0.0190, 0.3515, 0.1193, -0.4093, 0.1083, 0.2280, -0.2850, 0.1938, -0.7782, -0.7766, -0.9949, -1.0415, 0.1084, 0.1697, 0.0067, 0.2029, -0.0237, -0.4508, -0.4279, -0.0880, -0.0027, 0.2433, 0.0233, -0.3326, 0.0874, 0.1311, -0.3759, 0.2173, -0.7608, -0.7854, -1.0186, -1.0542, 0.1351, 0.0611, -0.0376, 0.2073, -0.0540, -0.4702, -0.4745, -0.1186, 0.0067, 0.0684, -0.1267, -0.2450, 0.0788, -0.0020, -0.4654, 0.2331, -0.7384, -0.7912, -1.0444, -1.0706, 0.1622, -0.0576, -0.1048, 0.2165, -0.0798, -0.4868, -0.5167, -0.1470, 0.0033, -0.1649, -0.3026, -0.1773, 0.0766, -0.1358, -0.5332, 0.2435, -0.7132, -0.7947, -1.0691, -1.0855, 0.1915, -0.1468, -0.1593, 0.2192, -0.1021, -0.5056, -0.5456, -0.1718, -0.0048, -0.3929, -0.4397, -0.1322, 0.0737, -0.2205, -0.5654, 0.2500, -0.6926, -0.7989, -1.0939, -1.0946, 0.2287, -0.1600, -0.1958, 0.2192, -0.1220, -0.5269, -0.5603, -0.1887, -0.0115, -0.5305, -0.4955, -0.0995, 0.0672, -0.2542, -0.5750, 0.2519, -0.6820, -0.8042, -1.1226, -1.0975, 0.2645, -0.1639, -0.2376, 0.2406, -0.1388, -0.5472, -0.5673, -0.1950, -0.0180, -0.5791, -0.5020, -0.0802, 0.0591, -0.2732, -0.5819, 0.2490, -0.6800, -0.8059, -1.1554, -1.0910, 0.2699, -0.1967, -0.2685, 0.2665, -0.1491, -0.5639, -0.5760, -0.1913, -0.0195, -0.6013, -0.5087, -0.0722, 0.0506, -0.3018, -0.5909, 0.2420, -0.6806, -0.8001, -1.1858, -1.0680, 0.2699, -0.2706, -0.2410, 0.2682, -0.1487, -0.5778, -0.5909, -0.1815, -0.0185, -0.6245, -0.5328, -0.0773, 0.0405, -0.3523, -0.5924, 0.2311, -0.6789, -0.7894, -1.2042, -1.0289]

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=model_path)

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Allocate tensors
interpreter.allocate_tensors()

# Print the input and output details of the model
print()
print("Input details:")
print(input_details)
print()
print("Output details:")
print(output_details)
print()

# Convert features to NumPy array
np_features = np.array(features)

# If the expected input type is int8 (quantized model), rescale data
input_type = input_details[0]['dtype']
if input_type == np.int8:
    input_scale, input_zero_point = input_details[0]['quantization']
    print("Input scale:", input_scale)
    print("Input zero point:", input_zero_point)
    print()
    np_features = (np_features / input_scale) + input_zero_point
    
# Convert features to NumPy array of expected type
np_features = np_features.astype(input_type)

# Add dimension to input sample (TFLite model expects (# samples, data))
np_features = np.expand_dims(np_features, axis=0)

# Create input tensor out of raw features
interpreter.set_tensor(input_details[0]['index'], np_features)

# Run inference
interpreter.invoke()

# output_details[0]['index'] = the index which provides the input
output = interpreter.get_tensor(output_details[0]['index'])

# If the output type is int8 (quantized model), rescale data
output_type = output_details[0]['dtype']
if output_type == np.int8:
    output_scale, output_zero_point = output_details[0]['quantization']
    print("Raw output scores:", output)
    print("Output scale:", output_scale)
    print("Output zero point:", output_zero_point)
    print()
    output = output_scale * (output.astype(np.float32) - output_zero_point)

# Print the results of inference
print("Inference output:", output)

Hope that helps!

ThomasVikstrom · February 15, 2022, 7:46am

@shawn_edgeimpulse Awesome! Thx for the explanation and further information, this all makes sense.

When using identical features in the float32 and in the int8 versions, the prediction outputs are however different. I re-downloaded both models from EI just to be sure I’m using up to date models.
What am I missing here?

(tensorflow_env) C:\Temp>python Shawn_inference_float32.py
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[{'name': 'serving_default_x:0', 'index': 0, 'shape': array([  1, 400]), 'shape_signature': array([  1, 400]), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
[{'name': 'StatefulPartitionedCall:0', 'index': 10, 'shape': array([1, 2]), 'shape_signature': array([1, 2]), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

===> **Inference output is [[0.06041268 0.9395873 ]]**


(tensorflow_env) C:\Temp>python Shawn_inference_int8.py

Input details:
[{'name': 'serving_default_x:0', 'index': 0, 'shape': array([  1, 400]), 'shape_signature': array([  1, 400]), 'dtype': <class 'numpy.int8'>, 'quantization': (0.0068965875543653965, 61), 'quantization_parameters': {'scales': array([0.00689659], dtype=float32), 'zero_points': array([61]), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Output details:
[{'name': 'StatefulPartitionedCall:0', 'index': 10, 'shape': array([1, 2]), 'shape_signature': array([1, 2]), 'dtype': <class 'numpy.int8'>, 'quantization': (0.00390625, -128), 'quantization_parameters': {'scales': array([0.00390625], dtype=float32), 'zero_points': array([-128]), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Input scale: 0.0068965875543653965
Input zero point: 61

Raw output scores: [[ 60 -60]]
Output scale: 0.00390625
Output zero point: -128

===> **Inference output: [[0.734375 0.265625]]**

Btw, what I’m trying to achieve is to use an EEG-device to distinguish between me thinking “left” or “right”. Now version 0.1 of the Python-prototype itself is working, and I can focus on getting more data. Focus is the right word, as it’s not easy to concentrate on only one thing at a time
I’m not sure if this will work in the end, but will never know if I don’t try.
The current setting is that I record 2 seconds of data, but if that does not give good results, I’ll later try with shorter or longer samples. Also need to collect random data for a third class “random”. Below what is output into the terminal window after each inference (with the float32 model). If the inference output is less than the confidence threshold of 0.7, dashes (----) are output.

L:0.9547519088 - R:0.0452480987     Left
L:0.0019846626 - R:0.9980152845     Right
L:0.8282242417 - R:0.1717757583     Left
L:0.9390486479 - R:0.0609513074     Left
L:0.0024030416 - R:0.9975969195     Right
L:0.2209503204 - R:0.7790496349     Right
L:0.5797867775 - R:0.4202132225     ----
L:0.9656862020 - R:0.0343138501     Left
L:0.0080929752 - R:0.9919070005     Right
L:0.9464728832 - R:0.0535271615     Left

shawn_edgeimpulse · February 16, 2022, 12:48am

Hi @ThomasVikstrom,

Try downloading the .tflite file again. The same thing happened to me, and it seems that I was working with an outdated .tflite file. I also made a slight edit to the code to round the input values first. The updated inference example can be found here: https://gist.github.com/ShawnHymel/f7b5014d6b725cb584a1604743e4e878

After updating the code and .tflite file, I get the following output (using your first “left” sample in the test set):

Input details:
[{'name': 'serving_default_x:0', 'index': 0, 'shape': array([  1, 400]), 'shape_signature': array([  1, 400]), 'dtype': <class 'numpy.int8'>, 'quantization': (0.010733922943472862, -15), 'quantization_parameters': {'scales': array([0.01073392], dtype=float32), 'zero_points': array([-15]), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Output details:
[{'name': 'StatefulPartitionedCall:0', 'index': 10, 'shape': array([1, 2]), 'shape_signature': array([1, 2]), 'dtype': <class 'numpy.int8'>, 'quantization': (0.00390625, -128), 'quantization_parameters': {'scales': array([0.00390625], dtype=float32), 'zero_points': array([-128]), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Input scale: 0.010733922943472862
Input zero point: -15

Raw output scores: [[ 127 -127]]
Output scale: 0.00390625
Output zero point: -128

Inference output: [[0.99609375 0.00390625]]

This matches the output when I download the quantized .eim file and run inference using the same raw features with this code: https://github.com/edgeimpulse/linux-sdk-python/blob/master/examples/custom/classify.py.

pi@raspberrypi:~/Projects/edge-impulse/muse-wave-01-eim $ python classify.py modelfile.eim features.txt
MODEL: /home/pi/Projects/edge-impulse/muse-wave-01-eim/modelfile.eim
Loaded runner for "Thomas Vikström / Muse Wave 01"
classification:
{'classification': {'left': 0.99609375, 'right': 0.00390625}}
timing:
{'anomaly': 0, 'classification': 0, 'dsp': 0, 'json': 0, 'stdin': 28}

ThomasVikstrom · February 16, 2022, 4:28pm

Well, don’t know why it does not produce similar results with the first int8 and float32 model files I used, and with the first programs. I did redownload the model files at the same time (within 3 seconds of each other).
But nevertheless, with this newest version I get same results as you, and the results are also close to identical between the float32 and int8 outputs.

So this is now solved, thx a lot @shawn_edgeimpulse for your support!