Enhance Dataiku - Snowflake interoperability
I have encountered several challenges involving column name handling and data type management while integrating with Snowflake. I'd pointed out a few things during a mission to integrate the platform on SF. It was no mean feat, especially when it came to managing schemas and types.
I noticed that the problem is becoming more recurrent on the forum.
- Dataiku changes special characters in column's names like
(
,)
, and*
to underscores_
when updating output schemas. This renaming causes downstream recipes to fail since they cannot find the expected columns.- Implement a mechanism in Dataiku DSS to better handle columns containing special characters, preserving the original names and avoiding unnecessary renaming.
- When the "update output schemas" option is selected, columns with special characters are renamed, causing recipe failures.
- Add an option for users to preserve the original column names during schema updates. This would ensure downstream dependencies remain intact and avoid recipe confusion.
- Dataiku currently does not adequately support ARRAY data types in SQL datasets and downcasts them to STRING. This leads to mismatch errors when executing SQL queries in recipes.
- Enhance Dataiku’s handling of ARRAY types, allowing native usage without forced downcasting to STRING. This will prevent type mismatch errors and improve the integrity of data processing.
- Isolated recipe runs fail due to column renaming, whereas recursive flow builds succeed. This inconsistency creates confusion and additional troubleshooting overhead.
- Ensure consistent schema handling in both recursive flow builds and isolated recipe runs to prevent discrepancies and streamline the debugging process.
- Errors related to schema mismatches and data type issues can be challenging to diagnose due to unclear error messages.
- Improve the diagnostic information and error messaging related to schema updates and data type mismatches. This would help users identify and resolve issues more efficiently. For example, when recipe code with steps manipulates columns of a certain type in base, this will very often change its format in BASE and involve type errors.
Comments
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,163 Neuron
With regards to special charaters in object names I agree in principle that this should be done and should work the way you expect but in practice it's pretty much imposible for Dataiku to handle all the different variations in supported charaters for the different data storage technologies. This means that while sensible this idea has a very low ROI (return on investment) give the high cost to implement. In general the best approach you can take for special charaters in object names is to rename all your tables and column names to match the supported characters which usually means using just underscore and lowercase/uppercase as needed. If this can't be done on the original objects themselves you can create views on top of the tables to remove unwanted charaters. You can always use column description if you want to have a column name that you can display in reports. Finally if you can't create views you can still import a dataset, rename the columns and share the newly created dataset with curated column names.