Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Is it possible to apply a preparation script directly from the dataiku API?
inside a notebook i have this (inputdataset and ModelID has been defined in prior cell):
...
df=inputdataset.get_dataframe()
model=dataiku.Model(ModelID)
predictor=model.get_predictor()
predictor.predict(df)
...
this return a ValueError:
" ValueError: The feature ColumnX doesn't exist in the dataset "
Columnx is a column created in the script part of the analyses.
this made me believe that the predict method is not applying the preparation script.
In the model report tab, "WhatIf?" section, there is a toggle to Apply preparation Script, there is a way to apply it with the python API, inside a notebook.
Operating system used: Windows 10
Hi @Echternacht,
There are two options that I see here.
When you train a model using Visual ML, the preparation script will automatically be applied to the input dataset. If you want to simply run a training, then indeed you could do this from the API:
import dataiku
client = dataiku.api_client()
project = client.get_default_project()
model = project.get_saved_model('MODEL_ID')
# to get the list of ML tasks, to pick for the next step
analysis.list_ml_tasks()
ml_task = analysis.get_ml_task('PREVIOUS_ML_TASK_RESULT')
train_ml_task = ml_task.train()
This will simply perform a training, which will also encompass running the preparation script set in the model analysis screen.
The other option would be to deploy your script to the flow as a recipe:
Then, you can simply run the recipe from the API, and use the output dataset of the recipe as your input to your predictor.predict() function. For example, I've deployed my script as the recipe "compute_training_prepared_final" here:
In my Python script I can then run:
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
project = client.get_default_project()
# get recipe
recipe = project.get_recipe('compute_training_prepared_modified')
# get model
model = dataiku.Model('MODEL')
predictor = model.get_predictor()
# get the output dataset of the deployed script
single_recipe_output = recipe.get_settings().get_recipe_outputs()['main']['items'][0]['ref']
# run the deployed script
recipe.run()
# get output df
output_dataset = dataiku.Dataset(single_recipe_output)
output_df = output_dataset.get_dataframe()
# now you can run predictor.predict() on the output dataset of the deployed script
predictor.predict(output_df)
Thanks,
Sarina
Hi @Echternacht,
There are two options that I see here.
When you train a model using Visual ML, the preparation script will automatically be applied to the input dataset. If you want to simply run a training, then indeed you could do this from the API:
import dataiku
client = dataiku.api_client()
project = client.get_default_project()
model = project.get_saved_model('MODEL_ID')
# to get the list of ML tasks, to pick for the next step
analysis.list_ml_tasks()
ml_task = analysis.get_ml_task('PREVIOUS_ML_TASK_RESULT')
train_ml_task = ml_task.train()
This will simply perform a training, which will also encompass running the preparation script set in the model analysis screen.
The other option would be to deploy your script to the flow as a recipe:
Then, you can simply run the recipe from the API, and use the output dataset of the recipe as your input to your predictor.predict() function. For example, I've deployed my script as the recipe "compute_training_prepared_final" here:
In my Python script I can then run:
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
project = client.get_default_project()
# get recipe
recipe = project.get_recipe('compute_training_prepared_modified')
# get model
model = dataiku.Model('MODEL')
predictor = model.get_predictor()
# get the output dataset of the deployed script
single_recipe_output = recipe.get_settings().get_recipe_outputs()['main']['items'][0]['ref']
# run the deployed script
recipe.run()
# get output df
output_dataset = dataiku.Dataset(single_recipe_output)
output_df = output_dataset.get_dataframe()
# now you can run predictor.predict() on the output dataset of the deployed script
predictor.predict(output_df)
Thanks,
Sarina