Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Basically, I want DataIku to stop changing storage types just because it thinks it knows better than me. These people seem to have the same problem.
I have large tables ~30 millions of lines. For some columns, the underlying column type is string because even though almost all of the rows - including all in the sample - are numeric, the definition in the database documentation is string. In rare cases, there is actually a letter in there, crashing my recipe. I know that these columns contain strings, and I don't want DataIku to convert them to bigint.
How do I stop DataIku from doing this without manually changing the column type? I am looking for a per-project global setting, since with basically every visual recipe I am using.
IMO, optimally, DataIku should never do this by itself - it can't know what the table will hold in the future, and any users that have no idea about data storage types will be confused as to what is wrong with their recipe. Instead, suggest it to the user with a nice explanation and let him manually approve of the change. It's better to waste a bit of storage space and compute power than to create potentially hard-to-detect problems by secretly converting data types.
Hello all future readers of this post!
I wanted to share an exciting update we just released as part of V12 which should help with this frustration.
In all DSS versions prior to V12, the default behavior is to infer column types for all dataset formats. V12 has a new default behavior for all new prepare recipes (existing recipes will not be changed), which is to infer data types for loosely-typed input dataset formats (e.g. CSV) and lock for strongly-typed ones (e.g. SQL, Parquet). We also now have an admin setting (Administration > Settings > Misc) in the UI to change this behavior if you so choose.