Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

Apply preparation Script on dataiku api

Solved!
Echternacht
Level 1
Apply preparation Script on dataiku api

Is it possible to apply a preparation script directly from the dataiku API?

inside a notebook i have this (inputdataset and ModelID has been defined in prior cell):

...

df=inputdataset.get_dataframe()

model=dataiku.Model(ModelID)

predictor=model.get_predictor()

predictor.predict(df)

...

 

this return a ValueError:

" ValueError: The feature ColumnX doesn't exist in the dataset "

Columnx is a column created in the script part of the analyses.

this made me believe that the predict method is not applying the preparation script. 

In the model report tab, "WhatIf?" section, there is a toggle to Apply preparation Script, there is a way to apply it with the python API, inside a notebook.


Operating system used: Windows 10

0 Kudos
1 Solution
SarinaS
Dataiker

Hi @Echternacht,

There are two options that I see here. 

When you train a model using Visual ML, the preparation script will automatically be applied to the input dataset. If you want to simply run a training, then indeed you could do this from the API:

import dataiku

client = dataiku.api_client()
project = client.get_default_project()
model = project.get_saved_model('MODEL_ID')

# to get the list of ML tasks, to pick for the next step 
analysis.list_ml_tasks()

ml_task = analysis.get_ml_task('PREVIOUS_ML_TASK_RESULT')
train_ml_task = ml_task.train()


This will simply perform a training, which will also encompass running the preparation script set in the model analysis screen. 

The other option would be to deploy your script to the flow as a recipe:

Screen Shot 2023-01-24 at 12.12.07 PM.png

Then, you can simply run the recipe from the API, and use the output dataset of the recipe as your input to your predictor.predict() function. For example, I've deployed my script as the recipe "compute_training_prepared_final" here:

Screen Shot 2023-01-24 at 12.14.04 PM.png

In my Python script I can then run:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_default_project()

# get recipe 
recipe = project.get_recipe('compute_training_prepared_modified')

# get model 
model = dataiku.Model('MODEL')
predictor = model.get_predictor()

# get the output dataset of the deployed script 
single_recipe_output = recipe.get_settings().get_recipe_outputs()['main']['items'][0]['ref']

# run the deployed script 
recipe.run()

# get output df 
output_dataset = dataiku.Dataset(single_recipe_output)
output_df = output_dataset.get_dataframe()

# now you can run predictor.predict() on the output dataset of the deployed script
predictor.predict(output_df)


Thanks,
Sarina

View solution in original post

1 Reply
SarinaS
Dataiker

Hi @Echternacht,

There are two options that I see here. 

When you train a model using Visual ML, the preparation script will automatically be applied to the input dataset. If you want to simply run a training, then indeed you could do this from the API:

import dataiku

client = dataiku.api_client()
project = client.get_default_project()
model = project.get_saved_model('MODEL_ID')

# to get the list of ML tasks, to pick for the next step 
analysis.list_ml_tasks()

ml_task = analysis.get_ml_task('PREVIOUS_ML_TASK_RESULT')
train_ml_task = ml_task.train()


This will simply perform a training, which will also encompass running the preparation script set in the model analysis screen. 

The other option would be to deploy your script to the flow as a recipe:

Screen Shot 2023-01-24 at 12.12.07 PM.png

Then, you can simply run the recipe from the API, and use the output dataset of the recipe as your input to your predictor.predict() function. For example, I've deployed my script as the recipe "compute_training_prepared_final" here:

Screen Shot 2023-01-24 at 12.14.04 PM.png

In my Python script I can then run:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_default_project()

# get recipe 
recipe = project.get_recipe('compute_training_prepared_modified')

# get model 
model = dataiku.Model('MODEL')
predictor = model.get_predictor()

# get the output dataset of the deployed script 
single_recipe_output = recipe.get_settings().get_recipe_outputs()['main']['items'][0]['ref']

# run the deployed script 
recipe.run()

# get output df 
output_dataset = dataiku.Dataset(single_recipe_output)
output_df = output_dataset.get_dataframe()

# now you can run predictor.predict() on the output dataset of the deployed script
predictor.predict(output_df)


Thanks,
Sarina