Apply preparation Script on dataiku api

Echternacht
Echternacht Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 1
edited July 16 in Using Dataiku

Is it possible to apply a preparation script directly from the dataiku API?

inside a notebook i have this (inputdataset and ModelID has been defined in prior cell):

...

df=inputdataset.get_dataframe()

model=dataiku.Model(ModelID)

predictor=model.get_predictor()

predictor.predict(df)

...

this return a ValueError:

" ValueError: The feature ColumnX doesn't exist in the dataset "

Columnx is a column created in the script part of the analyses.

this made me believe that the predict method is not applying the preparation script.

In the model report tab, "WhatIf?" section, there is a toggle to Apply preparation Script, there is a way to apply it with the python API, inside a notebook.


Operating system used: Windows 10

Best Answer

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    edited July 17 Answer ✓

    Hi @Echternacht
    ,

    There are two options that I see here.

    When you train a model using Visual ML, the preparation script will automatically be applied to the input dataset. If you want to simply run a training, then indeed you could do this from the API:

    import dataiku
    
    client = dataiku.api_client()
    project = client.get_default_project()
    model = project.get_saved_model('MODEL_ID')
    
    # to get the list of ML tasks, to pick for the next step 
    analysis.list_ml_tasks()
    
    ml_task = analysis.get_ml_task('PREVIOUS_ML_TASK_RESULT')
    train_ml_task = ml_task.train()


    This will simply perform a training, which will also encompass running the preparation script set in the model analysis screen.

    The other option would be to deploy your script to the flow as a recipe:

    Screen Shot 2023-01-24 at 12.12.07 PM.png

    Then, you can simply run the recipe from the API, and use the output dataset of the recipe as your input to your predictor.predict() function. For example, I've deployed my script as the recipe "compute_training_prepared_final" here:

    Screen Shot 2023-01-24 at 12.14.04 PM.png

    In my Python script I can then run:

    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    
    client = dataiku.api_client()
    project = client.get_default_project()
    
    # get recipe 
    recipe = project.get_recipe('compute_training_prepared_modified')
    
    # get model 
    model = dataiku.Model('MODEL')
    predictor = model.get_predictor()
    
    # get the output dataset of the deployed script 
    single_recipe_output = recipe.get_settings().get_recipe_outputs()['main']['items'][0]['ref']
    
    # run the deployed script 
    recipe.run()
    
    # get output df 
    output_dataset = dataiku.Dataset(single_recipe_output)
    output_df = output_dataset.get_dataframe()
    
    # now you can run predictor.predict() on the output dataset of the deployed script
    predictor.predict(output_df)


    Thanks,
    Sarina

Setup Info
    Tags
      Help me…