Append a pandas dataframe to an already existing Dataset within a plugin

Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 8 Partner

I'm creating a custom plugin containing a recipe that evaluates a machine learning model and outputs a DSS Dataset with performance metrics (it is very similar to the in-built Evaluate recipe). However, each time I train the model, I would like to append the new performance record to the already-existing Dataset rather than overwriting it.

The code I'm using at the end of my plugin recipe to produce such Dataset is the following:

output_dataset_name = get_output_names_for_role('output_perf')[0]
performance_metrics = dataiku.Dataset(output_dataset_name)
performance_metrics.write_with_schema(metrics_df)

metrics_df is the new record of performances that I would like to append to the existing Dataset.

I know that write_with_schema overwrites the existing dataset, but in the docs I couldn't find an argument or another method that appends a pandas dataframe to an existing DSS Dataset. Is there a way to achieve my objective?

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    Answer ✓

    Hi @RicSpd

    In the Input/Output tab of your Python recipe, you should tick the option to Append instead of override.

    You can also use the write_dataframe method.

    Good luck!

Answers

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.