API score Python Code optimization

Options
Kenza98
Kenza98 Partner, Registered Posts: 7 Partner
edited July 16 in Using Dataiku

Hello everyone,

Hope you're doing well !

In order to optimize the following Python code that takes more than 6h40min for 4.5M rows,

# Compute recipe outputs from inputs
input_json = json.loads(input_df.fillna("").to_json(orient='index'))
row = 0
data = []    
while row < input_df.shape[0]:
    prediction = client.predict_record("random_forest", input_json[str(row)])
    prediction["result"]['Id_client'] = input_json[str(row)]['Id_client']
    prediction["result"]['proba_1']=prediction["result"]['probas']['1']
    row += 1
    data.append(prediction["result"])

application_scored_df = pd.DataFrame(data)


# Write recipe outputs
application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(application_scored_df)

I wrote this code that uses the function "predict_records" instead of "predict_record",

#predict_records prend en param une liste de dictionnaires commençant par 'features' : [{'features':{...}}, {'features':{}}...]
l=[]
s=['features']
for i in range(0, input_df.shape[0]) :
    l0=input_df.iloc[[i,]].set_index([s])
    dicti=l0.to_dict('index')
    l.append(dicti)

prediction_test = client.predict_records("random_forest", l)
prediction_df=pd.DataFrame(prediction_test['results'])
prediction_df.insert(0, 'Id_client', input_df['Id_client'])
prediction_df['proba_1']=prediction_df['probas'].apply(pd.Series)[['1']]

application_score = dataiku.Dataset("APPLICATION_SCORE")
application_score.write_with_schema(prediction_df)

the code works very well on 5 rows but once applied on the full dataset, I have this error :

err_dss.PNG

Do you know what this error is about and how can I solve this to reduce execution time ?

Thank's in advance,

Kenza

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Options

    Hi,

    For your actual error, could you please attach the full log (Actions > View full job log) ?

    Since your code is running within DSS, do you really need to call an API node ? You could also use a scoring recipe locally, which would be much faster.

  • Kenza98
    Kenza98 Partner, Registered Posts: 7 Partner
    Options

    To use the scoring recipe, you have to be in the same flow as the one where the model was implemented, am I wrong ?

    For my case, I have the model in the development flow and want to use it in an other flow (production one), this is why I am using the API node.

Setup Info
    Tags
      Help me…