What are the features from the Raw Data block?

I’m missing it in the docs. When I pass four measurements into the Raw Data block I get back twelve, features 0, 1, and 2 for each input. And some of the differences are interesting, except I don’t know what I’m looking at.

Can someone help me understand what those features actually are? I’m guessing that Feature 0 is the ‘raw’ value, but right now that’s just a guess. The others are mysterious to me.

Sorry if I’m being obtuse, but I can’t find this anywhere.

Thanks!

Art

@Acb3

This docs page gives maybe more insight.

The Raw Data block generates windows from data samples without any specific signal processing. It is great for signals that have already been pre-processed and if you just need to feed your data into the Neural Network block.

Below an example

This is an example with IMU data:

Raw ‘features’ (Ax Ay Az Gx Gy Gz Ax Ay …):
-0.0271, 0.0731, -0.0143, -0.0007, 0.0146, 0.0160, -0.0238, 0.0631

Processed ‘features’ (Ax Ay Az Gx Gy Gz Ax Ay …):
-0.0271, 0.0731, -0.0143, -0.0007, 0.0146, 0.0160, -0.0238, 0.0631

as you notice (with scale axes = 1) the processes ‘features’ = raw ‘features’.

The raw block doesn’t extract ‘features’. It is more a ‘feedthrough block’.

I hope this helps.

Regards,
J.

Thanks Joeri, but the docs yield no insight on this at all. Which may be my shortcoming, but in any event I still need help. Please let me be more specific:

My training set is five columns wide… timestamp, temp, hum, press, and gas. I would expect, as you describe, that the “raw” pre-processor would simply pass through the four independent columns.

Instead, when I go into the anomaly detection block it offers me twelve features, including “temp Feature 0”, “temp Feature 1”, and “temp Feature 2”. Likewise for hum, press, and gas.

So what I don’t understand, and what the docs don’t seem to tell, is what is the nature of those three features generated for each input?

Thanks again!

Art

@Acb3 I never used the anomaly detection in combination with a raw data block.
It is also need clear to me what you try to do.

You apply both the raw data to the learning and the anomaly block

But how many timestamp do you have?

Feature 0 is at t0
Feature 1 is at t1
Feature 2 is at t2

@louis @shawn_edgeimpulse Am I correct?

Hello @Acb3,

Here is the code behind the raw data processing block:

Also, @Acb3, could you try to set the window increase with the same value as the window size under the Create impulse page?

I am wondering if your issue comes from there. (not 100% sure though).

Regards,

Louis

@Acb3

I hope this example gives you some insights.

I set my window size at 100ms and I have sample rate = 100Hz

In this case I get feature 0 up to 9 (in total 10) for each of my Acc axis (accX, accY and accZ)

as @louis said you properly need to change the window size.

1 Like

OK, so I reduced my window size to one seven-second sample, which I’m thinking should reduce the width of the output features… but now the Generate Features function fails:

Scheduling job in cluster...
Job started
Reducing dimensions for visualizations...
ERR: Found array with 0 feature(s) (shape=(2000, 0)) while a minimum of 1 is required.
Application exited with code 1

Job failed (see above)

Would be helpful, I think, if the error specified which array has zero features.

My aim here is to see what I can learn about my raw data before venturing into derived features.

Thanks @joeri and @louis!

@Acb3,

What is your frequency and what is your new window size?
Indeed the error is not super clear, I will create an internal ticket to see how we can improve that.

Regards,

Louis

@Louis,

My frequency is 0.14286 Hz, window is now 28000 ms, increase is 7000 ms.

I’ve run into more than a few unchecked and unexplained Python errors as I’ve been experimenting. I think most of them have to do with dimensionality issues, which I’m thinking might be worth trapping.

Thanks!

@Acb3,

I can see that you have only one data sample in your dataset, could you try to split your unique long data sample to several small ones? And make sure they all have the same frequency, you can see the frequency by expanding the data acquisition view, a tooltip will appear when leaving your mouse on the axis:

Regards,

Louis

Right, @louis… What I expect you’re seeing is one-third of my first-round dataset. All the data is sampled every seven seconds, which is 0.142 Hz. Each segment of my dataset is about 45K records… what do you think would be a more amenable set-size? Thanks!

Hello @Acb3,

:thinking: If you are supposed to have a 0.142 Hz frequency, what has been uploaded to the studio shows 0.137 Hz. Can you make sure you don’t have empty “rows” or any missing data? That might explain the ERR: Found array with 0 feature(s).

You can try to split your data sample into smaller files (like 30 min or 1h) instead of one file of 88h.
It might be also easier to spot where you have missing “rows”.

Regards,

Louis