Is it possible to disable Dataiku data type detection?

wvde
Level 1
Is it possible to disable Dataiku data type detection?

Hello, 

Is it possible to disable Dataiku's automatic data type detection? I find this feature to be more trouble than helpful and would prefer to have everything read in and kept as a string unless I explicitly cast it to something else.

Some specific troubles that relate to this are:

(1) Auto-detecting ID columns as integers rather than strings for new files

(2) Determining detected types off of first x records in a union which happens to be Nulls and thus forcing the type to be bigint rather than double.

Thanks,


Operating system used: Windows

4 Replies
Miasm1
Level 2

Hello,

Yes, you can adjust Dataiku's automatic data type detection:

  1. During data import, select "Advanced" and choose to read all columns as strings.
  2. To address your concerns:
  • Manually set ID columns as strings during import.
  • Set types before union operations to prevent incorrect type inference.

Always review the schema after actions to ensure correctness.

I hope this will help you!

Regads
Mia SmithDevOps Course
Jason
Level 3

I have this problem as well, but it extends beyond just the initial data import.  Recipes that use python (and specifically Pandas) sample the top of the table to determine data types.  I have a field that contains item numbers, and in nearly all cases they are an integer, but sometimes they have a letter suffix.  The type detection in pandas then treats it as an integer just long enough to force the schema, then when the data arrives, the database freaks out about the type mismatch.  This occurs in several places, most infuriatingly in the time series resampling recipe.

me2
Level 3

I had a similar problem involving Dataiku's Data Type detection.  It is definitely an area for improvement.

https://community.dataiku.com/t5/Using-Dataiku/unintended-data-filtering-from-Prepare-Recipe/m-p/375...

 

tgb417

I’ve been pointing out these “duck typing” of columns challenges for a while now.  I’ve submitted two product ideas that it would be great to get further feedback to the Dataiku team about.  

Please consider “voting” for either of these ideas, or adding your own product idea if neither of these cover your use case or suggestion.

https://community.dataiku.com/t5/Product-Ideas/The-ability-to-turn-off-Cell-level-quot-Duck-Typing-q... 

https://community.dataiku.com/t5/Product-Ideas/Override-to-Standard-quot-Duck-Typing-quot-of-Variabl...

 

 

--Tom