I’m looking into developing a model using data that has been sampled non-periodically. I have numerous CSV files that represent the same amount of time (e.g. a 10 second time slice), but each file’s column of timestamps is not the same since the data/sensor was not sampled/published periodically.
Consider two files that represent 10 seconds of data, but one file contains 100 samples and another file contains with 60 samples.
What is the suggested approach for handling/post-processing csv files that represent a common “time slice” but have a different number of samples in the CSV?
Are you saying that the sample rate differs among the different CSV files? For example, two rows might be 10ms apart in one file but 15ms apart in another file.
Or, are you saying that the sample rate is the same (e.g. all rows are 10ms apart), but each CSV file has a different total length (one CSV file has 100 rows but another has 60 rows)?
I think it’s the latter, but I wanted to make sure. If it is indeed the second case, you should be fine to upload the samples to Edge Impulse. On the Impulse Design page, you can adjust the window size and stride so that the e.g. 100-row file is broken into multiple samples. Hope that helps!
It’s actually more aligned with the former: the sample rate for each row is not periodic, the samples come in bursts and we simply collect over a certain amount of time. So, while collecting n number of events (1 event = 1 CSV file) with each event lasting t seconds, the number of samples for each event is not always the same. *sorry if I’m making this even more confusing … *
So, for example, one file might have 1000 rows and another file might have 1200 rows, but, regardless, each file represents the same total time. Moreover, the samples in each file are not always periodic.
It appears Edge Impulse looks at the difference between the 1st and 2nd row’s timestamp to establish a sample rate, then it looks at the number of samples to determine the total time. So, for our case, to format the CSV files in a periodically-sampled manner, we have periodically padded the file with a new value once the data changes. Another thought is to interpolate via linear or polynomial fit. But, I wasn’t sure if Edge Impulse provided any guidance for using non-periodic data.
@davidnorwood as long as the timestamps are always monotonically increasing, you should be able to normalize the sampling frequency. That is, interpolate all feature values at consistent intervals.This will create a kind of “virtual” sampling frequency that is sufficiently consistent to be fed to an ML model.
On the ingestion side, we can deal with non-periodic inputs as long as timestamps increase monotonically but once you deploy, you will have to figure out how to fill your inference buffer in a periodic way (real or virtual) since ML algos expect a consistent quantization in either time or space (as is the case with images)