Community Conundrum 28: News Engagement is live! Read More

Create scoring recipe using the API

Level 2
Level 2
Create scoring recipe using the API

Hi,

I have some troubles creating a scoring recipe and only keeping some features in the output dataset.
In order to have the wanted engine and Spark config, I've firstly manually created the recipe and used the recipe.get_settings().get_payload().

Then, I've created this function creating the scoring recipe with the wanted parameters that we want.

 

from dataikuapi.dss.recipe import PredictionScoringRecipeCreator
import dataiku
import json

def creation_recipe_scoring(project, recipe_name, dataset_to_score, model_id, output_dataset_name = "scoring_temp_name"
, output_connection="some_connection", payload=None, keep_columns=None) : 
""" 
Create a recipe scoring and its output dataset using a deployed model 
If the output dataset already exists, delete it and recreate it 

Params
:project object: A :class: dataikuapi.dss.project.DSSProject to create the recipe in the project 
:recipe_name str: the name of the recipe 
:dataset_to_score str: the name of the dataset to score 
:model_id str: the ID of the deployed model that will score the dataset_to_score 
:output_dataset_name str: the dataset's name of output 
:output_connection str: connection used to store the output_dataset 
:payload unicode (str): json as unicode of the payload 
If None, no changement in the default engine (in memory)
The payload has to be taken from a manually created recipe using : 
project.get_recipe('recipe').get_settings().get_payload()
:keep_columns list: list of the columns to keep in the scored dataset 

Returns : 
:recipe_handle object: handle to the recipe just created 
A :class: dataikuapi.dss.recipe.DSSRecipe 
"""
try : 
builder = PredictionScoringRecipeCreator(name =recipe_name, project=project)
builder.with_input_model(model_id)
builder.with_input(dataset_to_score)
builder.with_new_output(output_dataset_name, output_connection)
recipe_handle = builder.build()

except Exception as e : # if the dataset already exists 
print(e)
project.get_dataset(output_dataset_name).delete(drop_data=True)
print('Dataset dropped')

builder = PredictionScoringRecipeCreator(name=recipe_name, project=project)
builder.with_input_model(model_id)
builder.with_input(dataset_to_score)
builder.with_new_output(output_dataset_name, output_connection)
recipe_handle = builder.build() 

if payload is not None and keep_columns is not None : 
print('Modifying payload')
settings = recipe_handle.get_settings() # def_payload = recipe_handle.get_definition_and_payload()
payload = json.loads(payload)
unicode_columns = [unicode(col) for col in keep_columns]
payload['keptInputColumns'] = unicode_columns # only keep those columns (have to be in unicode in the payload)
settings.set_payload(json.dumps(payload)) # def_payload.set_payload(json.dumps(payload)) # add them 
settings.save() # recipe_handle.set_definition_and_payload(def_payload) # save the modifications done 

print('Payload of the recipe :\n{0}'.format(settings.get_payload()))

print('Recipe set')

return recipe_handle

 

But when the dataset is created, there are still all my input columns.


I've compared all the settings from a manually created and an API created recipe and everything is the same : recipe_settings, payload and status.
(all checked from the corresponding methods recipe.get_settings().recipe_settings, recipe.get_settings.get_payload(), recipe.get_settings().get_status().get_engines_details() / .get_selected_engine_details() )


Do you have any idea on how to correclty keep only the wanted columns ?


Greetings,
Steven

PS : I want to use the scoring recipe with Spark and the Java Scoring because it is way quicker than getting the predictor, the dataframe and applying it (15 minutes vs 20-30 minutes to get the dataframe in DSS RAM + 10 minutes of scoring).

PS2: The recipe name added to the recipe creator isn't the one displayed : it is always "compute_" + [dataset_name] 

 

Edit 1 : adjusted display, added PS2

0 Kudos
0 Replies
A banner prompting to get Dataiku DSS