Make Dataiku Managed Datasets Less Opinionated (aka stop dropping my tables)

After 11.4.0 (or earlier as we upgraded from 11.0.3), Dataiku not defaults to dropping and re-creating by default when using Dataset python APIs if for some reason the dataset schema and underlying table do not match. It will do this silently and pass jobs, where later we find out that we've lost our history in the base table (snowflake in our case).

This is really scary default behavior. Dataiku should default to throwing errors and stopping jobs vs dropping tables and re-creating them. Even better, the user should be able to intelligently control this behavior. 

"Dont do exotic things importthepandas and youll be ok" - sure, however, if someone changes or alters a data type in snowflake and forgets to re-sync schemas in dataiku, DSS should not drop my table.

Give us more flexibility for the drop/re-create behavior.

 

 

4 Comments

This is particularly important when working with slow to acquire datasets.  I've lost a few day with a dataset getting dropped in similar senarios.

--Tom

This is particularly important when working with slow to acquire datasets.  I've lost a few day with a dataset getting dropped in similar senarios.

apichery
Dataiker

We fixed the issue in DSS 11.4.1 and above.

Status changed to: Released

We fixed the issue in DSS 11.4.1 and above.

Great to see this fixed!

Great to see this fixed!

@apichery Python dataset methods no longer drop by default?

@apichery Python dataset methods no longer drop by default?