Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Is it possible to disable Dataiku's automatic data type detection? I find this feature to be more trouble than helpful and would prefer to have everything read in and kept as a string unless I explicitly cast it to something else.
Some specific troubles that relate to this are:
(1) Auto-detecting ID columns as integers rather than strings for new files
(2) Determining detected types off of first x records in a union which happens to be Nulls and thus forcing the type to be bigint rather than double.
Operating system used: Windows
Yes, you can adjust Dataiku's automatic data type detection:
Always review the schema after actions to ensure correctness.
I hope this will help you!
I have this problem as well, but it extends beyond just the initial data import. Recipes that use python (and specifically Pandas) sample the top of the table to determine data types. I have a field that contains item numbers, and in nearly all cases they are an integer, but sometimes they have a letter suffix. The type detection in pandas then treats it as an integer just long enough to force the schema, then when the data arrives, the database freaks out about the type mismatch. This occurs in several places, most infuriatingly in the time series resampling recipe.
I had a similar problem involving Dataiku's Data Type detection. It is definitely an area for improvement.
I’ve been pointing out these “duck typing” of columns challenges for a while now. I’ve submitted two product ideas that it would be great to get further feedback to the Dataiku team about.
Please consider “voting” for either of these ideas, or adding your own product idea if neither of these cover your use case or suggestion.