We have recently been using this system and came across some unusual testing results. Upon reviewing testing outcomes, we are confused as to why some samples have been classified incorrectly. We are using time series data, which requires windowing. We changed the confidence threshold to see whether this would change anything but the issue seems to still persist.
For example, one sample is classified correctly for 77% of the windows but is still classified incorrectly overall. Please see below.
This may be misunderstanding but any clarification on this would be great.
Okay thank you. So the accuracy denotes the number of windows classified correctly rather than on a per sample basis. We just verified this ourselves on a small test set. We were unsure about the red/green and whether the accuracy was based upon this.