Make Dataiku Managed Datasets Less Opinionated (aka stop dropping my tables)
After 11.4.0 (or earlier as we upgraded from 11.0.3), Dataiku not defaults to dropping and re-creating by default when using Dataset python APIs if for some reason the dataset schema and underlying table do not match. It will do this silently and pass jobs, where later we find out that we've lost our history in the base table (snowflake in our case).
This is really scary default behavior. Dataiku should default to throwing errors and stopping jobs vs dropping tables and re-creating them. Even better, the user should be able to intelligently control this behavior.
"Dont do exotic things importthepandas and youll be ok" - sure, however, if someone changes or alters a data type in snowflake and forgets to re-sync schemas in dataiku, DSS should not drop my table.
Give us more flexibility for the drop/re-create behavior.
Comments
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
This is particularly important when working with slow to acquire datasets. I've lost a few day with a dataset getting dropped in similar senarios.
-
We fixed the issue in DSS 11.4.1 and above.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
Great to see this fixed!
-
importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
@apichery
Python dataset methods no longer drop by default?