Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on January 12, 2024 11:39AM
Likes: 0
Replies: 6
Hello experts,
In dataiku v12.3.0, I was trying to append dataframe using write_dataframe() in existing dataset (with same schema). But it always overwrites with last dataframe even though the dataset spec is configured like:
dataset.spec_item["appendMode"] = True
The dataset is classified as output so it doesn't let me use dataset.get_dataframe(). It throws exception: "You cannot read dataset test.my-dataset, it is not declared as an input"
Regards,
upx86
You don't set this in code, you do it on the Inputs/Outputs tab of the recipe and set the "Append instead of overwrite" check box. Then you write normally to the output like in a normal recipe and Dataiku will do the append for you.
Where exactly is this "Append instead of overwrite" check box? I don't seem to have it in the Inputs/Outputs tab of the python recipe?
In a Python recipe it will be in the Inputs/Outputs tab below the output dataset name (see screen shot below). But in your case it's not present since you are using an output dataset connection type (S3) that does not support writting in append mode.
If the output dataset does not support Append Instead of overwrite Is There any solution
Yes, the solution is to use a connection type that supports writing in append mode. Failing that you could use a circular recipe (allowed in v13) to first read the whole output dataset, then add the new records and then write the whole thing again. Very inefficient though…