Join us on July 16th as we explore real-world Reinforcement Learning Learn more

API score Python Code optimization

Level 2
API score Python Code optimization

Hello everyone, 

Hope you're doing well !

In order to optimize the following Python code that takes more than 6h40min for 4.5M rows,

# Compute recipe outputs from inputs
input_json = json.loads(input_df.fillna("").to_json(orient='index'))
row = 0
data = []    
while row < input_df.shape[0]:
    prediction = client.predict_record("random_forest", input_json[str(row)])
    prediction["result"]['Id_client'] = input_json[str(row)]['Id_client']
    prediction["result"]['proba_1']=prediction["result"]['probas']['1']
    row += 1
    data.append(prediction["result"])

application_scored_df = pd.DataFrame(data)


# Write recipe outputs
application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(application_scored_df)

I wrote this code that uses the function "predict_records" instead of "predict_record",

#predict_records prend en param une liste de dictionnaires commençant par 'features' : [{'features':{...}}, {'features':{}}...]
l=[]
s=['features']
for i in range(0, input_df.shape[0]) :
    l0=input_df.iloc[[i,]].set_index([s])
    dicti=l0.to_dict('index')
    l.append(dicti)

prediction_test = client.predict_records("random_forest", l)
prediction_df=pd.DataFrame(prediction_test['results'])
prediction_df.insert(0, 'Id_client', input_df['Id_client'])
prediction_df['proba_1']=prediction_df['probas'].apply(pd.Series)[['1']]

application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(prediction_df)

the code works very well on 5 rows but once applied on the full dataset, I have this error :

err_dss.PNG

 

Do you know what this error is about and how can I solve this to reduce execution time ?

 

Thank's in advance,

Kenza 

0 Kudos
2 Replies
Dataiker
Dataiker

Hi,

For your actual error, could you please attach the full log (Actions > View full job log) ?

Since your code is running within DSS, do you really need to call an API node ? You could also use a scoring recipe locally, which would be much faster.

0 Kudos
Level 2
Author

To use the scoring recipe, you have to be in the same flow as the one where the model was implemented, am I wrong ? 

For my case, I have the model in the development flow and want to use it in an other flow (production one), this is why I am using the API node. 

0 Kudos