Community Conundrum 25:Feature Visualization is now live! Read More

How to reapply rescaling when you want to predict your data

Dataiker
Dataiker
How to reapply rescaling when you want to predict your data

I created a Decision Tree model and I want to export it so I can use it outside of Dataiku.



I took the pickle file and loaded into Python to continue using it there.




import pickle

f = open('clf.pkl', 'rb')
loaded_model = pickle.load(f, encoding='latin1')


On model settings, I used standard rescaling which uses the avgstd. Dataiku also exports this json file with details about the rescaling:




{
"shifts": [
4.2708957215287455,
5.582300530732055,
4.721780769116731,
6.309030531691733,
4.534705132386515,
50183.866161634876,
4.628957297141036,
5.931597829632046,
1.834355009673187,
21814.135528393213,
0.9999925875959688,
0.23165941222883746,
-0.11146363232269413
],
"columns": [
"col1",
"col2",
"col3",
"col4",
"col5",
"col6",
"col7",
"col8",
"col9",
"col10",
"col11",
"col12",
"col13"
],
"inv_scales": [
0.29041217789420476,
0.32026605114154144,
0.3398879256267485,
0.2539738260220278,
0.27817344479641604,
1.1217850173179438e-05,
0.3181917203503525,
0.2886476076886483,
0.37842451508835384,
2.329233011164756e-05,
0.21830904186227362,
2.003574563132119,
1.386943696546877
]
}


Let's say I have a new input with the original values (before rescaling). How can I use the above information to rescale all the features on the new object I have to predict the results?

0 Kudos
1 Reply
Dataiker
Dataiker
Hi,

For each column:

rescaled_feature = (input_feature - shift) * inv_scale
0 Kudos
Labels (1)