Sample length shrinkage and Data Forwarder inconsistency

mikaL · January 2, 2025, 7:25pm

Hello,
first of all i have no idea if this is bug or this is how it supposed to be but it does not meet my needs.
I have encountered a pretty unpleasant issue. So I was using a data forwarder for collecting 3-axis accelerometer and 3-axis gyroscope data. I have spent much time and effort collecting and splitting data for over 15 labels already, and I doubted sample lengths out of nowhere just recently. I trained the model with 5 labels just for live testing, and it performed terribly. Now I am sure the problem is related to how data was collected and interpreted with the data forwarder.

Here is what happened: I basically used the same Arduino code provided on the data forwarder docs page with little modification, nothing special. I defined frequency as 125 Hz. When I first uploaded that code on my ESP32 and started the data forwarder, it detected 100 Hz; sometimes it was 98 or 99. There was always a 23-28 Hz difference between defined and detected frequencies. I guess reasons might be serial communication or some other things that cause delay. For that reason, I used --frequency 125 flag for consistency, as I thought it would fix frequency to that value. I started sampling 10s intervals and then splitting those 10s into actual windows where motion was detected. The problem is that after finishing sampling that 10s, it was treated as 8s. I didn’t pay attention to it as I hadn’t realised how it could affect inference.

As I observed, the actual sampling rate was 100 Hz, but the data forwarder was treating it as 125. That means in each second there were 100 points given, but an extra 25 points were taken from another second, so roughly 1250ms was treated as 1000ms, which caused an overall shrinkage from 10s to 8s (sometimes it was 7800 ms, 8300 ms based on whatever frequency was detected). I was worried about sample lengths because they seemed to be unnaturally short. Then for testing purposes, I raised the frequency in code to 150 Hz and somehow managed to hit actual 125 Hz. The length of the same labels was longer in this case, and testing the model on that data gave terrible results. Each sample (actual) I tested was about 200-250 ms longer than the samples (shrunk) the model was trained on.

I don’t know, maybe that’s expected behaviour, and I had to be more careful when collecting data, but the fact is that I have gathered data of 15 labels and split them by hand, which took too much time. The main problem is that when I deploy the model, inference will be done on the data with the actual time it took and not the shrunken one. That shrinkage was done by DataForwarder, and I don’t know if implementing the same behaviour in my code to shrink data will give proper results or is even possible.

If anyone has encountered the same problem and handled it successfully, I would be grateful for hearing any advice. Thanks in advance.

Eoin · January 15, 2025, 7:46pm

Hi @mikaL

Thanks for reporting this let me summarise your issues here:

Data Shrinkage: The 10-second sampling windows were shortened to 8 seconds (or similar durations). This occurred because the Data Forwarder treated the data as if sampled at 125 Hz, even though the actual frequency was closer to 100 Hz.

Auto-detection of frequency is the main issue here right? Can you give this a try passing the frequency flag will override that autodetection.

$ edge-impulse-data-forwarder --frequency 100

Inference Issues: When the model trained on the “shrunk” data was tested with “actual” data, it performed poorly because the temporal alignment was mismatched.

This seems like a symptom of the upscaling / downscaling you may have performed manually?

Resolution Attempts: Manually adjusting the frequency to compensate for discrepancies but still faced issues aligning the data collected with the forwarder and the real-world inference data.

Did you try to use our scaling in the timeseries block or how did you attempt to adjust the resolution? We have upscaling that you can use to 100 here it will upsample or downsample automatically (The Frequency (HZ) input is what you need to adjust for upscaling to 100HZ etc)

Let me know if this helps and we can figure out what needs to be logged as a bug / doc fix, it looks like your main concern is with the autodetection of frequency.

Best

Eoin

fyi @AlexE @brianmcfadden

mikaL · January 19, 2025, 1:44am

Hi Eoin,
thanks for your reply. I was using

$ edge-impulse-data-forwarder --frequency 125
all the time for data collection.

I have not upscaled or downscaled anything manually. It was using auto detected frequency which was 125.

When i entered 10000ms period for sampling, then to split it into smaller samples around 4-5 from each one, the collected sample was showing around 8s time which was weird but did not pay any attention to it. Then few weeks ago when collecting data with same command, it was saying that detected frequency was about 20-25hz lower than i have in my arduino code. Back then when i was noober than i am now , when started data aqusition, i was not using --frequency flag and every time i run that command without flag different frequencies were detected. Thats why i wanted to fix its value for data consistency.

The thing is that when sampling with 10s period it should have gave me around 1250 datapoints but was giving about 900-950.

Maybe it compresses data to imitate higher frequency, when it is actually lower than demanded but idk, it still confuses me.

At this point thats all i can surely say about that. I also mixed inferencing and live classification with each other when writing this post. Have not fully tested everything to say something works or not. There are several things i still haven’t tried. I will provide more inforamtion when i fully test it and see if results are affected.

Best regards
Mikael

Eoin · January 27, 2025, 3:21pm

Great thanks, dont fully know whats going on here but maybe some timing issue?

What baud rate are you using?
try setting it as high as you can e.g. -

Serial.begin(115200);

Let me know when you do another thing to reduce the noise is to enable the silent flag:

edge-impulse-data-forwarder --frequency 100 --silent

Hopefully you can get a better result, make sure to try some alternate cables, OS, and boards if you have them. It could be a software/hardware fault we aren’t considering.

Best

Eoin