Inconsistent Cpp deployment results

Kerkenstuff · December 8, 2020, 9:30pm

My Cpp code is based on example_standalone_inferencing from github and the CPP11 library export.

My code:

1 - read in wav file
2 - strip header and convert to 16b PCM
3 - extract remaining wav data needed for window size, convert to 16b PCM, pass (*with header) into raw features vector
4 - run inferencing (same exact code as emample) sprintf results and write output.
5 - move file pointer and repeat N times.

Each time a new model is trained and deployed I delete all 3 library files and recompile with new files. The results have been perfect for the majority of deployments with the exception of two models.

One did not work whatsoever. Everything was classified as the first label at >99% confidence. Retraining the model didnt work. Had to scratch and redo the whole project for it to work. The newer deployment works exactly as expected and tested in the “model testing” tab, with the exception of one class, ting. Not sure if this is a bug. I thought it could possibly be overfitting so I added 0.1 dropout layer and the model testing worked better, but still not deploying properly with the “ting” label.

Kerkenstuff · December 8, 2020, 9:31pm

Ting in the testing dashboard :

Kerkenstuff · December 8, 2020, 9:35pm

Ting deployment in Cpp : (Ting is label 3, beginning with 0.16016)

ting test cpp

aurel · December 9, 2020, 8:57am

Hi @Kerkenstuff,

The features vector expects PCM 16 format. Could it be that your f32 signal is scaled to [-1,1]?

You can check that inference works correctly by copying a raw features sample from the Live Classification tab:

Copy the content to your features vector and check that the correct “ting” label is detected.
If that works, the issue is most likely on the wav data extraction side.

Aurelien

janjongboom · December 9, 2020, 9:00am

@Kerkenstuff, this looks like the input data is not passed in correctly. The output from the Studio and the C++ Library matches 99.9% when deploying a f32 model (and typically 95% for int8).

extract remaining wav data needed for window size, convert to f32, pass (*with header) into raw features vector

If you have normal 16-bit PCM data you should:

Skip 44 bytes
Read the full file 2 bytes at a time into an int16_t features[16000] buffer.
Construct a signal from the buffer above:

int raw_feature_get_data(size_t offset, size_t length, float *out_ptr) {
    return numpy::int16_to_float(features + offset, out_ptr, length);
}

int main() {
    signal_t signal;
    signal.total_length = 16000;
    signal.get_data = &raw_feature_get_data;

That should get you the data from a WAV file in C/C++.

If you could post the code you use the parse the WAV file that’d be helpful.

Kerkenstuff · December 9, 2020, 7:49pm

@aurel You are correct! I misspoke. The data is read in as 16b depth. I will edit my original to clarify that. I have not written the code to accept a copied in list of features but I can work on that!

@janjongboom This is (probably a better and more straightforward way to go about) the same approach I am currently using. Strip off the header and fill the vector with the datatype converted header, then raw_features.push_back the remaining data starting from the pointer location (begins at end of header, increments by window increase).

If it is the case that I am passing in the wrong data, is it possible that it could work correctly with some models and not with others? The only different between model that fails is the increased number of labels. Even so, all of the other labels pass bench testing perfectly. I am not sure what different NN failure modes look like but trying to diagnose based on symptoms is strange in this case because I would assume that if I were passing it in incorrectly it would give wildly wrong results.

Thanks for the quick replies!

janjongboom · December 10, 2020, 8:54am

Hi @Kerkenstuff

If it is the case that I am passing in the wrong data, is it possible that it could work correctly with some models and not with others? The only different between model that fails is the increased number of labels. Even so, all of the other labels pass bench testing perfectly. I am not sure what different NN failure modes look like but trying to diagnose based on symptoms is strange in this case because I would assume that if I were passing it in incorrectly it would give wildly wrong results.

Are you sure the models have exactly the same number of features, same frequency, etc.?

Would be happy to take a look at the code. You can drop it to me at jan@edgeimpulse.com as well if you don’t want to share here.