Make Dataiku Managed Datasets Less Opinionated (aka stop dropping my tables)

importthepandas · March 2023

After 11.4.0 (or earlier as we upgraded from 11.0.3), Dataiku not defaults to dropping and re-creating by default when using Dataset python APIs if for some reason the dataset schema and underlying table do not match. It will do this silently and pass jobs, where later we find out that we've lost our history in the base table (snowflake in our case).

This is really scary default behavior. Dataiku should default to throwing errors and stopping jobs vs dropping tables and re-creating them. Even better, the user should be able to intelligently control this behavior.

"Dont do exotic things importthepandas and youll be ok" - sure, however, if someone changes or alters a data type in snowflake and forgets to re-sync schemas in dataiku, DSS should not drop my table.

Give us more flexibility for the drop/re-create behavior.

tgb417 · April 2023

This is particularly important when working with slow to acquire datasets. I've lost a few day with a dataset getting dropped in similar senarios.

apichery · May 2023

We fixed the issue in DSS 11.4.1 and above.

Turribeach · May 2023

Great to see this fixed!

importthepandas · May 2023

@apichery
Python dataset methods no longer drop by default?

Make Dataiku Managed Datasets Less Opinionated (aka stop dropping my tables)

Released · Last Updated March 2023

Comments

Categories

Setup Info

Tags