We have recently been using this system and came across some unusual testing results. Upon reviewing testing outcomes, we are confused as to why some samples have been classified incorrectly. We are using time series data, which requires windowing. We changed the confidence threshold to see whether this would change anything but the issue seems to still persist.
For example, one sample is classified correctly for 77% of the windows but is still classified incorrectly overall. Please see below.
This may be misunderstanding but any clarification on this would be great.
The project ID is 141188. We understand the role of the “uncertain” classification. We just don’t understand why a sample with high overall accuracy is not classed as correct overall.
On an unrelated note, I think I saw this in another thread but just wanted to check, the testing tests on the float32 unquantized model. Is that right?
I hope you are well. We briefly met at the EMEA in Cyprus. We tried a range of settings including the default of 0.6 and a bunch of others both higher and lower.
Okay thank you. So the accuracy denotes the number of windows classified correctly rather than on a per sample basis. We just verified this ourselves on a small test set. We were unsure about the red/green and whether the accuracy was based upon this.
The decision to make a sample green or red overall is based on a fixed 80% accuracy. The threshold you set applies on windows inly as mentioned by Louis.