Apply preparation Script on dataiku api
Is it possible to apply a preparation script directly from the dataiku API?
inside a notebook i have this (inputdataset and ModelID has been defined in prior cell):
...
df=inputdataset.get_dataframe() model=dataiku.Model(ModelID) predictor=model.get_predictor() predictor.predict(df)
...
this return a ValueError:
" ValueError: The feature ColumnX doesn't exist in the dataset "
Columnx is a column created in the script part of the analyses.
this made me believe that the predict method is not applying the preparation script.
In the model report tab, "WhatIf?" section, there is a toggle to Apply preparation Script, there is a way to apply it with the python API, inside a notebook.
Operating system used: Windows 10
Best Answer
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @Echternacht
,
There are two options that I see here.
When you train a model using Visual ML, the preparation script will automatically be applied to the input dataset. If you want to simply run a training, then indeed you could do this from the API:import dataiku client = dataiku.api_client() project = client.get_default_project() model = project.get_saved_model('MODEL_ID') # to get the list of ML tasks, to pick for the next step analysis.list_ml_tasks() ml_task = analysis.get_ml_task('PREVIOUS_ML_TASK_RESULT') train_ml_task = ml_task.train()
This will simply perform a training, which will also encompass running the preparation script set in the model analysis screen.
The other option would be to deploy your script to the flow as a recipe:
Then, you can simply run the recipe from the API, and use the output dataset of the recipe as your input to your predictor.predict() function. For example, I've deployed my script as the recipe "compute_training_prepared_final" here:In my Python script I can then run:
import dataiku from dataiku import pandasutils as pdu import pandas as pd client = dataiku.api_client() project = client.get_default_project() # get recipe recipe = project.get_recipe('compute_training_prepared_modified') # get model model = dataiku.Model('MODEL') predictor = model.get_predictor() # get the output dataset of the deployed script single_recipe_output = recipe.get_settings().get_recipe_outputs()['main']['items'][0]['ref'] # run the deployed script recipe.run() # get output df output_dataset = dataiku.Dataset(single_recipe_output) output_df = output_dataset.get_dataframe() # now you can run predictor.predict() on the output dataset of the deployed script predictor.predict(output_df)
Thanks,
Sarina