Is it possible to disable Dataiku data type detection?
Hello,
Is it possible to disable Dataiku's automatic data type detection? I find this feature to be more trouble than helpful and would prefer to have everything read in and kept as a string unless I explicitly cast it to something else.
Some specific troubles that relate to this are:
(1) Auto-detecting ID columns as integers rather than strings for new files
(2) Determining detected types off of first x records in a union which happens to be Nulls and thus forcing the type to be bigint rather than double.
Thanks,
Operating system used: Windows
Answers
-
Hello,
Yes, you can adjust Dataiku's automatic data type detection:
- During data import, select "Advanced" and choose to read all columns as strings.
- To address your concerns:
- Manually set ID columns as strings during import.
- Set types before union operations to prevent incorrect type inference.
Always review the schema after actions to ensure correctness.
I hope this will help you!
-
I have this problem as well, but it extends beyond just the initial data import. Recipes that use python (and specifically Pandas) sample the top of the table to determine data types. I have a field that contains item numbers, and in nearly all cases they are an integer, but sometimes they have a letter suffix. The type detection in pandas then treats it as an integer just long enough to force the schema, then when the data arrives, the database freaks out about the type mismatch. This occurs in several places, most infuriatingly in the time series resampling recipe.
-
I had a similar problem involving Dataiku's Data Type detection. It is definitely an area for improvement.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
I’ve been pointing out these “duck typing” of columns challenges for a while now. I’ve submitted two product ideas that it would be great to get further feedback to the Dataiku team about.
Please consider “voting” for either of these ideas, or adding your own product idea if neither of these cover your use case or suggestion. -
psvnm Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭✭
Hi, I am not able find the option in Advanced as you have mentioned. Can you please check once?
-
psvnm Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭✭
Hi, I am not able find the option in Advanced as you have mentioned. Can you please check once?