Hello everyone,
Hope you're doing well !
In order to optimize the following Python code that takes more than 6h40min for 4.5M rows,
# Compute recipe outputs from inputs
input_json = json.loads(input_df.fillna("").to_json(orient='index'))
row = 0
data = []
while row < input_df.shape[0]:
prediction = client.predict_record("random_forest", input_json[str(row)])
prediction["result"]['Id_client'] = input_json[str(row)]['Id_client']
prediction["result"]['proba_1']=prediction["result"]['probas']['1']
row += 1
data.append(prediction["result"])
application_scored_df = pd.DataFrame(data)
# Write recipe outputs
application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(application_scored_df)
I wrote this code that uses the function "predict_records" instead of "predict_record",
#predict_records prend en param une liste de dictionnaires commençant par 'features' : [{'features':{...}}, {'features':{}}...]
l=[]
s=['features']
for i in range(0, input_df.shape[0]) :
l0=input_df.iloc[[i,]].set_index([s])
dicti=l0.to_dict('index')
l.append(dicti)
prediction_test = client.predict_records("random_forest", l)
prediction_df=pd.DataFrame(prediction_test['results'])
prediction_df.insert(0, 'Id_client', input_df['Id_client'])
prediction_df['proba_1']=prediction_df['probas'].apply(pd.Series)[['1']]
application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(prediction_df)
the code works very well on 5 rows but once applied on the full dataset, I have this error :
Do you know what this error is about and how can I solve this to reduce execution time ?
Thank's in advance,
Kenza
Hi,
For your actual error, could you please attach the full log (Actions > View full job log) ?
Since your code is running within DSS, do you really need to call an API node ? You could also use a scoring recipe locally, which would be much faster.
To use the scoring recipe, you have to be in the same flow as the one where the model was implemented, am I wrong ?
For my case, I have the model in the development flow and want to use it in an other flow (production one), this is why I am using the API node.