Make Dataiku Managed Datasets Less Opinionated (aka stop dropping my tables)

importthepandas · ‎03-31-2023

After 11.4.0 (or earlier as we upgraded from 11.0.3), Dataiku not defaults to dropping and re-creating by default when using Dataset python APIs if for some reason the dataset schema and underlying table do not match. It will do this silently and pass jobs, where later we find out that we've lost our history in the base table (snowflake in our case).

This is really scary default behavior. Dataiku should default to throwing errors and stopping jobs vs dropping tables and re-creating them. Even better, the user should be able to intelligently control this behavior.

"Dont do exotic things importthepandas and youll be ok" - sure, however, if someone changes or alters a data type in snowflake and forgets to re-sync schemas in dataiku, DSS should not drop my table.

Give us more flexibility for the drop/re-create behavior.

tgb417 · ‎04-19-2023

This is particularly important when working with slow to acquire datasets. I've lost a few day with a dataset getting dropped in similar senarios.

--Tom

apichery · ‎05-16-2023

We fixed the issue in DSS 11.4.1 and above.

Turribeach · ‎05-16-2023

Great to see this fixed!

importthepandas · ‎05-18-2023

@apichery Python dataset methods no longer drop by default?

Make Dataiku Managed Datasets Less Opinionated (aka stop dropping my tables)

Labels

Data Exploration and Preparation

Consistent display of chart title when hover on chart tab

I want to use Dataiku in Japanese.

Programmatic Git Support (Shell, Python API or Both)

Method to re-order V12 Visual ML override rules