Append a pandas dataframe to an already existing Dataset within a plugin
I'm creating a custom plugin containing a recipe that evaluates a machine learning model and outputs a DSS Dataset with performance metrics (it is very similar to the in-built Evaluate recipe). However, each time I train the model, I would like to append the new performance record to the already-existing Dataset rather than overwriting it.
The code I'm using at the end of my plugin recipe to produce such Dataset is the following:
output_dataset_name = get_output_names_for_role('output_perf')[0] performance_metrics = dataiku.Dataset(output_dataset_name) performance_metrics.write_with_schema(metrics_df)
metrics_df is the new record of performances that I would like to append to the existing Dataset.
I know that write_with_schema overwrites the existing dataset, but in the docs I couldn't find an argument or another method that appends a pandas dataframe to an existing DSS Dataset. Is there a way to achieve my objective?
Best Answer
-
Hi @RicSpd
In the Input/Output tab of your Python recipe, you should tick the option to Append instead of override.
You can also use the write_dataframe method.
Good luck!
Answers
-
eduardcmp Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭
Hi, I have the same problem but the append instead of overwrite is not working for me... any ideas?