Append a pandas dataframe to an already existing Dataset within a plugin

Solved!
RicSpd
Level 2
Append a pandas dataframe to an already existing Dataset within a plugin

I'm creating a custom plugin containing a recipe that evaluates a machine learning model and outputs a DSS Dataset with performance metrics (it is very similar to the in-built Evaluate recipe). However, each time I train the model, I would like to append the new performance record to the already-existing Dataset rather than overwriting it.

The code I'm using at the end of my plugin recipe to produce such Dataset is the following:

output_dataset_name = get_output_names_for_role('output_perf')[0]
performance_metrics = dataiku.Dataset(output_dataset_name)
performance_metrics.write_with_schema(metrics_df)

metrics_df is the new record of performances that I would like to append to the existing Dataset. 

I know that write_with_schema overwrites the existing dataset, but in the docs I couldn't find an argument or another method that appends a pandas dataframe to an existing DSS Dataset. Is there a way to achieve my objective?

0 Kudos
1 Solution
Liev
Dataiker Alumni

Hi @RicSpd 

In the Input/Output tab of your Python recipe, you should tick the option to Append instead of override.

You can also use the write_dataframe method.

Good luck!

 

View solution in original post

3 Replies
Liev
Dataiker Alumni

Hi @RicSpd 

In the Input/Output tab of your Python recipe, you should tick the option to Append instead of override.

You can also use the write_dataframe method.

Good luck!

 

RicSpd
Level 2
Author

It was so simple I feel a little stupid ๐Ÿ˜…
Thanks @Liev!

0 Kudos
eduardcmp
Level 1

Hi, I have the same problem but the append instead of overwrite is not working for me... any ideas?

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku