Model testing: high single sample accuracy (above threshold) yet still classed as incorrect

Gibbs · November 2, 2022, 3:05pm

Hi there,

We have recently been using this system and came across some unusual testing results. Upon reviewing testing outcomes, we are confused as to why some samples have been classified incorrectly. We are using time series data, which requires windowing. We changed the confidence threshold to see whether this would change anything but the issue seems to still persist.

For example, one sample is classified correctly for 77% of the windows but is still classified incorrectly overall. Please see below.
Edge Impulse Forum

This may be misunderstanding but any clarification on this would be great.

Thank you

aurel · November 2, 2022, 5:38pm

Hi @Gibbs,

Could you let us know which project ID (or url) it is?

Aurelien

Gibbs · November 3, 2022, 9:57am

Hi Aurel,

The project ID is 141188. We understand the role of the “uncertain” classification. We just don’t understand why a sample with high overall accuracy is not classed as correct overall.

On an unrelated note, I think I saw this in another thread but just wanted to check, the testing tests on the float32 unquantized model. Is that right?

Thanks,
Michael

louis · November 4, 2022, 9:57am

Hello @Gibbs

Which settings have you set in the confidence threshold?

Correct,

Best,

Louis

Gibbs · November 4, 2022, 1:59pm

Hi Louis,

I hope you are well. We briefly met at the EMEA in Cyprus. We tried a range of settings including the default of 0.6 and a bunch of others both higher and lower.

Thanks,
Michael

louis · November 4, 2022, 2:54pm

Hi @Gibbs,

It seems that only the “window” is validated or not according to the threshold but not the overall sample.

Thanks for the feedback, I’ll let our Studio/UX team know so they can find a better way to take this into account.

Best,

Louis

Gibbs · November 4, 2022, 3:23pm

Hi Louis,

Okay thank you. So the accuracy denotes the number of windows classified correctly rather than on a per sample basis. We just verified this ourselves on a small test set. We were unsure about the red/green and whether the accuracy was based upon this.

Thank you for all the help.

All the best,
Michael

aurel · November 4, 2022, 5:33pm

Hi @Gibbs,

The decision to make a sample green or red overall is based on a fixed 80% accuracy. The threshold you set applies on windows inly as mentioned by Louis.

Aurelien