Community Conundrum 28: News Engagement is live! Read More

How to reapply rescaling when you want to predict your data

Dataiker
Dataiker
How to reapply rescaling when you want to predict your data

I created a Decision Tree model and I want to export it so I can use it outside of Dataiku.



I took the pickle file and loaded into Python to continue using it there.




import pickle

f = open('clf.pkl', 'rb')
loaded_model = pickle.load(f, encoding='latin1')


On model settings, I used standard rescaling which uses the avgstd. Dataiku also exports this json file with details about the rescaling:




{
"shifts": [
4.2708957215287455,
5.582300530732055,
4.721780769116731,
6.309030531691733,
4.534705132386515,
50183.866161634876,
4.628957297141036,
5.931597829632046,
1.834355009673187,
21814.135528393213,
0.9999925875959688,
0.23165941222883746,
-0.11146363232269413
],
"columns": [
"col1",
"col2",
"col3",
"col4",
"col5",
"col6",
"col7",
"col8",
"col9",
"col10",
"col11",
"col12",
"col13"
],
"inv_scales": [
0.29041217789420476,
0.32026605114154144,
0.3398879256267485,
0.2539738260220278,
0.27817344479641604,
1.1217850173179438e-05,
0.3181917203503525,
0.2886476076886483,
0.37842451508835384,
2.329233011164756e-05,
0.21830904186227362,
2.003574563132119,
1.386943696546877
]
}


Let's say I have a new input with the original values (before rescaling). How can I use the above information to rescale all the features on the new object I have to predict the results?

0 Kudos
1 Reply
Dataiker
Dataiker
Hi,

For each column:

rescaled_feature = (input_feature - shift) * inv_scale
0 Kudos
Labels (1)
A banner prompting to get Dataiku DSS