Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello everyone,
Hope you're doing well !
In order to optimize the following Python code that takes more than 6h40min for 4.5M rows,
# Compute recipe outputs from inputs
input_json = json.loads(input_df.fillna("").to_json(orient='index'))
row = 0
data = []
while row < input_df.shape[0]:
prediction = client.predict_record("random_forest", input_json[str(row)])
prediction["result"]['Id_client'] = input_json[str(row)]['Id_client']
prediction["result"]['proba_1']=prediction["result"]['probas']['1']
row += 1
data.append(prediction["result"])
application_scored_df = pd.DataFrame(data)
# Write recipe outputs
application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(application_scored_df)
I wrote this code that uses the function "predict_records" instead of "predict_record",
#predict_records prend en param une liste de dictionnaires commençant par 'features' : [{'features':{...}}, {'features':{}}...]
l=[]
s=['features']
for i in range(0, input_df.shape[0]) :
l0=input_df.iloc[[i,]].set_index([s])
dicti=l0.to_dict('index')
l.append(dicti)
prediction_test = client.predict_records("random_forest", l)
prediction_df=pd.DataFrame(prediction_test['results'])
prediction_df.insert(0, 'Id_client', input_df['Id_client'])
prediction_df['proba_1']=prediction_df['probas'].apply(pd.Series)[['1']]
application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(prediction_df)
the code works very well on 5 rows but once applied on the full dataset, I have this error :
Do you know what this error is about and how can I solve this to reduce execution time ?
Thank's in advance,
Kenza
Hi,
For your actual error, could you please attach the full log (Actions > View full job log) ?
Since your code is running within DSS, do you really need to call an API node ? You could also use a scoring recipe locally, which would be much faster.
To use the scoring recipe, you have to be in the same flow as the one where the model was implemented, am I wrong ?
For my case, I have the model in the development flow and want to use it in an other flow (production one), this is why I am using the API node.