Hello,
first of all i have no idea if this is bug or this is how it supposed to be but it does not meet my needs.
I have encountered a pretty unpleasant issue. So I was using a data forwarder for collecting 3-axis accelerometer and 3-axis gyroscope data. I have spent much time and effort collecting and splitting data for over 15 labels already, and I doubted sample lengths out of nowhere just recently. I trained the model with 5 labels just for live testing, and it performed terribly. Now I am sure the problem is related to how data was collected and interpreted with the data forwarder.
Here is what happened: I basically used the same Arduino code provided on the data forwarder docs page with little modification, nothing special. I defined frequency as 125 Hz. When I first uploaded that code on my ESP32 and started the data forwarder, it detected 100 Hz; sometimes it was 98 or 99. There was always a 23-28 Hz difference between defined and detected frequencies. I guess reasons might be serial communication or some other things that cause delay. For that reason, I used --frequency 125 flag for consistency, as I thought it would fix frequency to that value. I started sampling 10s intervals and then splitting those 10s into actual windows where motion was detected. The problem is that after finishing sampling that 10s, it was treated as 8s. I didn’t pay attention to it as I hadn’t realised how it could affect inference.
As I observed, the actual sampling rate was 100 Hz, but the data forwarder was treating it as 125. That means in each second there were 100 points given, but an extra 25 points were taken from another second, so roughly 1250ms was treated as 1000ms, which caused an overall shrinkage from 10s to 8s (sometimes it was 7800 ms, 8300 ms based on whatever frequency was detected). I was worried about sample lengths because they seemed to be unnaturally short. Then for testing purposes, I raised the frequency in code to 150 Hz and somehow managed to hit actual 125 Hz. The length of the same labels was longer in this case, and testing the model on that data gave terrible results. Each sample (actual) I tested was about 200-250 ms longer than the samples (shrunk) the model was trained on.
I don’t know, maybe that’s expected behaviour, and I had to be more careful when collecting data, but the fact is that I have gathered data of 15 labels and split them by hand, which took too much time. The main problem is that when I deploy the model, inference will be done on the data with the actual time it took and not the shrunken one. That shrinkage was done by DataForwarder, and I don’t know if implementing the same behaviour in my code to shrink data will give proper results or is even possible.
If anyone has encountered the same problem and handled it successfully, I would be grateful for hearing any advice. Thanks in advance.