Append a pandas dataframe to an already existing Dataset within a plugin

I'm creating a custom plugin containing a recipe that evaluates a machine learning model and outputs a DSS Dataset with performance metrics (it is very similar to the in-built Evaluate recipe). However, each time I train the model, I would like to append the new performance record to the already-existing Dataset rather than overwriting it.
The code I'm using at the end of my plugin recipe to produce such Dataset is the following:
output_dataset_name = get_output_names_for_role('output_perf')[0] performance_metrics = dataiku.Dataset(output_dataset_name) performance_metrics.write_with_schema(metrics_df)
metrics_df is the new record of performances that I would like to append to the existing Dataset.
I know that write_with_schema overwrites the existing dataset, but in the docs I couldn't find an argument or another method that appends a pandas dataframe to an existing DSS Dataset. Is there a way to achieve my objective?
Best Answer
-
Hi @RicSpd
In the Input/Output tab of your Python recipe, you should tick the option to Append instead of override.
You can also use the write_dataframe method.
Good luck!
Answers
-
eduardcmp Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭
Hi, I have the same problem but the append instead of overwrite is not working for me... any ideas?
-
I got the same problem which I am not able to append new data without defining append instead of overwrite; but I know why because I have not set related table not in the output; but my question is that what if I want to upload new data with python but not related python; how can I do it?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,509 Neuron
This thread has been marked as answered already. Please start a new thread. You can refer to this thread in your new thread.