Class values - Regression Model

Joeri · June 10, 2022, 6:51pm

Hello,

I am verifying a model train local vs. model trained in EI.
I have a small question to be sure.

I perform:

Y = download_data('https://studio.edgeimpulse.com/v1/api/{projectID}/training/{learnID}/y')
with open('y_train.npy', 'wb') as file:
    file.write(Y)
Y = np.load('y_train.npy')[:,0]

and get following:
print(set(list(Y))) gives
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89}

My class values are:
My classes_values = [ “85”, “86”, “87”, “88”, “89”, “90”, “91”, “92”, “93”, “94”, “95”, “96”, “97”, “98”, “99”, “100”, “101”, “102”, “103”, “104”, “105”, “106”, “107”, “108”, “109”, “110”, “111”, “112”, “113”, “114”, “115”, “116”, “117”, “118”, “119”, “120”, “121”, “122”, “123”, “124”, “125”, “126”, “127”, “128”, “129”, “130”, “131”, “132”, “133”, “134”, “135”, “136”, “137”, “138”, “139”, “140”, “141”, “142”, “143”, “144”, “145”, “146”, “147”, “148”, “149”, “150”, “151”, “152”, “153”, “154”, “155”, “156”, “157”, “158”, “159”, “160”, “161”, “162”, “163”, “164”, “165”, “166”, “167”, “168”, “169”, “170”, “171”, “172”, “174” ]

Is this interpretation correct:
If Y = 1 → class value = 85
if Y = 2 → class value = 86
…
if Y = 10 → class value = 94
etc

In other words, are the class values always in ascending order?

If I, for example, new extra data to EI and one of these has a classes_value not in the list above, for example 83, will my new classes_values become:

My classes_values = [“83”, “85”, “86”, “87”, “88”, “89”, “90”, “91”, “92”, “93”, “94”, “95”, “96”, “97”, “98”, “99”, “100”, “101”, “102”, “103”, “104”, “105”, “106”, “107”, “108”, “109”, “110”, “111”, “112”, “113”, “114”, “115”, “116”, “117”, “118”, “119”, “120”, “121”, “122”, “123”, “124”, “125”, “126”, “127”, “128”, “129”, “130”, “131”, “132”, “133”, “134”, “135”, “136”, “137”, “138”, “139”, “140”, “141”, “142”, “143”, “144”, “145”, “146”, “147”, “148”, “149”, “150”, “151”, “152”, “153”, “154”, “155”, “156”, “157”, “158”, “159”, “160”, “161”, “162”, “163”, “164”, “165”, “166”, “167”, “168”, “169”, “170”, “171”, “172”, “174” ]

and will Y be “redefined” as follows:
If Y = 1 → class value = 83
if Y = 2 → class value = 85
…
if Y = 10 → class value = 93
etc

Regards,
J.

matkelcey · June 14, 2022, 11:14pm

There’s two things to chat about here…

Firstly I think we might have a slight mismatch in terminology regarding: regression vs classification.

In a regression model the output is a single real value. e.g. in the example project. Dashboard - Tutorial: temperature regression - Edge Impulse the output is a temp value. It can take any value; 3.04, 7.2, 1.45.

In a classification model the output is one of a discrete set of categorical class values. e.g. {cat, dog, frog}

So in a regression model, there are no class values as such, and so there’s no need to do any mapping. ( Having said that, I know we’re got some wording bugs in the UI where we refer to classes in the regression model flow, so I’ve raised an internal bug for that, apologies if that caused confusion. )

Secondly; Be careful when you do something like set(list(y)) since that imposes an ordering… The order for these has no meaning in relation to a mapping.

>>> set([3,1,4,1,5,9,2,6])
{1, 2, 3, 4, 5, 6, 9}

Let me know if this helps, or if you have further questions.

Cheers,
Mat

Joeri · June 15, 2022, 9:48am

@matkelcey thanks for the feedback.

Concerning regression and classification.

So in a regression model, there are no class values as such, and so there’s no need to do any mapping.

If I take, for example, the temperature:

# model architecture
model = Sequential()
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(classes, name='y_pred'))

...
...

classes_values = [ "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60" ]
classes = len(classes_values)

Y = tf.keras.utils.to_categorical(Y - 1, classes)

If the interpretation of the code is correct, currently in EI a regression problem is solved as a classification problem. So I can understand why EI refer to classes.

matkelcey · June 22, 2022, 11:14pm

I strongly suspect though this model will perform better if you just regress directly the single value though ( since your output “classes” representative a continuum of even spaced values ). Will certainly reduce the number of parameters by a fair bit.

i.e.

model = Sequential()
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, name='y_pred'))

with a switch to a MeanSquaredError loss.

You get very little generalisation across the output values if you treat them categorically.

Cheers,
Mat