How to append dataframe in existing output dataset
Hello experts,
In dataiku v12.3.0, I was trying to append dataframe using write_dataframe() in existing dataset (with same schema). But it always overwrites with last dataframe even though the dataset spec is configured like:
dataset.spec_item["appendMode"] = True
The dataset is classified as output so it doesn't let me use dataset.get_dataframe(). It throws exception: "You cannot read dataset test.my-dataset, it is not declared as an input"
Regards,
upx86
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
You don't set this in code, you do it on the Inputs/Outputs tab of the recipe and set the "Append instead of overwrite" check box. Then you write normally to the output like in a normal recipe and Dataiku will do the append for you.
-
Where exactly is this "Append instead of overwrite" check box? I don't seem to have it in the Inputs/Outputs tab of the python recipe?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
In a Python recipe it will be in the Inputs/Outputs tab below the output dataset name (see screen shot below). But in your case it's not present since you are using an output dataset connection type (S3) that does not support writting in append mode.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron