Append a pandas dataframe to an already existing Dataset within a plugin

RicSpd Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 8 Partner

I'm creating a custom plugin containing a recipe that evaluates a machine learning model and outputs a DSS Dataset with performance metrics (it is very similar to the in-built Evaluate recipe). However, each time I train the model, I would like to append the new performance record to the already-existing Dataset rather than overwriting it.

The code I'm using at the end of my plugin recipe to produce such Dataset is the following:

output_dataset_name = get_output_names_for_role('output_perf')[0]
performance_metrics = dataiku.Dataset(output_dataset_name)

metrics_df is the new record of performances that I would like to append to the existing Dataset.

I know that write_with_schema overwrites the existing dataset, but in the docs I couldn't find an argument or another method that appends a pandas dataframe to an existing DSS Dataset. Is there a way to achieve my objective?

Best Answer

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    Answer ✓

    Hi @RicSpd

    In the Input/Output tab of your Python recipe, you should tick the option to Append instead of override.

    You can also use the write_dataframe method.

    Good luck!


Setup Info
      Help me…