[BigQuery][Type] Working in flow with DATETIME (BQ) column type
Hi,
I'm using Dataiku 11.2.0 Entreprise with BigQuery. (If other information about set up needed, don't mind to ask)
In my flow, I'm using a python recipe which creates a new table from a pandas dataframe, and one of the column is created using `pd.to_datetime()` in order to have a DATETIME typed column in the BigQuery table. Table is created, Dataiku is created too, everything's fine until there.
When I'm using the resulting Dataiku dataset, in a Join recipe, I get a BigQuery error about this column because I can't insert DATETIME typed datas into a TIMESTAMP typed column. (The output dataset schema is unset before the run of the Join recipe)
I've tried different things and it seems that Dataiku doesn't want to create a DATETIME typed column in BigQuery while creating the output dataset.
That's the first time I have this issue which seems to be a bug.
Where about to use a lot of DATETIME typed column in the future, so I can't just switch and use TIMESTAMP (it's not a long-term solution even if a short-term one)
If you need further information, I'll be pleased to give you more details about this.
Best regards,
Ugo
Operating system used: Windows 11
Best Answer
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
Hi @UgoD
,DSS type system contains a subset of BigQuery type system. As a consequence, DSS will never write some data types. This includes:- DATE (Instead, DSS will generate TIMESTAMP columns)- DATETIME (Instead, DSS will generate TIMESTAMP columns)
- NUMERIC (Instead, DSS will generate DOUBLE columns)
- DECIMAL (Instead, DSS will generate DOUBLE columns)As a workaround, you can cast - in SQL recipes - the output columns to TIMESTAMP.
For example:SELECT safe_cast(`cc_id_client` as bigint ) as id_client, TIMESTAMP(`dt_date`, 'UTC') as dt_date FROM `table`
This change is not required with other databases, only with BigQuery. The reason is that BigQuery does not automatically coerce DATE values into TIMESTAMP, while the other databases do.Thanks,Jordan
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 DataikerHi @UgoD
,In DSS 11.1.0, we changed the way BigQuery creates BigQuery tables for managed datasets to match what the other recipes do. DSS will now only create BigQuery tables with TIMESTAMP columns and no longer with DATE or DATETIME columns. This prevents issues in downstreams recipes where DSS is not capable of correctly reading back the DATE or DATETIME columns (i.e. without unexpected timeshifts).The workaround is to cast them to TIMESTAMP in the query.DSS 11.1.0 release notes: https://doc.dataiku.com/dss/latest/release_notes/11.html#version-11-1-0-october-21st-2022It's this line in the release notes* Fixed issue with date types coming from BigQueryPlease let me know if you have any questions.Thanks!Jordan -
Hi @JordanB
,So just to be crystal clear, it's now impossible to work with BigQuery table if there is any DATETIME in it?
What is the easiest way to convert a sql_table_dataset (connected to a BigQuery table with DATETIME typed column) to an usable dataset? Is it to use a prepare recipe?
Best regards,
Ugo
-
Hello @JordanB
,I have a question: What happens if casting TIMESTAMP to DATE type, the field still leaves it as TIMESTAMP in BigQuery? Is there any other option that can be done from DataIku to prevent it from leaving the field as TIMESTAMP?
Thanks in advance