Improve options for / consistency of handling Snowflake date/time types in DSS
DSS sets TIMESTAMP_TZ and TIMESTAMP_NTZ columns as strings in output datasets schemas. However, the DATE type is considered a date in an output dataset even though it doesn't have a time. TIMESTAMP_TZ columns do have both a time and a time zone but are set to a string type. The behavior seems to differ by recipe type (in version 11.1.2). SQL Script recipes result in DATE and TIMESTAMP_LTZ columns being set as dates and usable as dates elsewhere in DSS (e.g., in charts). SQL Query recipes seem to set DATE and TIMESTAMP_LTZ as dates but they show with a meaning of Date Unparsed and can't be used as dates in Charts.
It would be great if DSS consistently worked as follows across all recipe types:
1. TIMESTAMP_TZ and TIMESTAMP_LTZ (point in time types) were both set as dates in output dataset schemas and were directly usable as dates elsewhere in DSS (e.g., in Charts).
2. The user had the option to specify that DATE and TIMESTAMP_NTZ columns (types with no time and/or time zone) be treated as dates in output dataset schemas and be directly usable as dates elsewhere in DSS.
Additionally, it would good if the output dataset schema types that were set by DSS didn't generate an error message when reviewing the output dataset schema. For example, DSS sets a Snowflake DATE column to date in the schema but then reports the following error when one clicks on the Dataset Schema tab: "Type mismatch for column 2 (AS_OF_DT) : 'TIMESTAMP' in dataset, 'DATE'(91:DATE) in table".
Marlan
Comments
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,161 Neuron
This is not unique to Snowflake but it affects all other sources which handle Date data types without time (Oracle, BigQuery, SQL Server, etc). The issue comes from the fact that theDateclass in Java is a representation of a specific instant in time, expressed in milliseconds. Therefore, this makes it impossible to ignore the time on a Date unless you use third party library. Personally I think Dataiku should support Date without time data types as these are extremely common both in data sources, data and Python.
-
Hi !
We feel your pain and are currently investigating adding a new data type in DSS to support "Dates without time" values (aka DATE type in Snowflake).
Will keep you posted,
Arnaud
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron
Thanks @apichery
- that's great to hear. Date handling has been a real pain point for us since we moved to Snowflake (previously used Netezza and for whatever reason dates weren't much of a problem).@Turribeach
- thanks for your comment. I thought that dates had to be an instant in time too but currently the Snowflake DATE type is treated as a date (at least to some extent) while TIMESTAMP_TZ (which is an instant in time) is not. At least that's what I found in my testing.In any case, it'd be great if all date types (including DATE and TIMESTAMP_NTZ) could be considered dates by DSS.
Marlan
-
importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
DEFINITELY agree here as a snowflake customer. The only thing that works for us is TIMESTAMP_TZ. We have folks trying to inject strings into date data types to get them to work. We have essentially settled to write out all temporal data as dataiku dates with TIMESTAMP_TZ as snowflake data type. We need to be careful of session params, however, with this setting.
And after the upgrade, we also would like better support for decimal data types