[BigQuery][Type] Working in flow with DATETIME (BQ) column type

UgoD
UgoD Registered Posts: 9 ✭✭✭

Hi,

I'm using Dataiku 11.2.0 Entreprise with BigQuery. (If other information about set up needed, don't mind to ask)

In my flow, I'm using a python recipe which creates a new table from a pandas dataframe, and one of the column is created using `pd.to_datetime()` in order to have a DATETIME typed column in the BigQuery table. Table is created, Dataiku is created too, everything's fine until there.

When I'm using the resulting Dataiku dataset, in a Join recipe, I get a BigQuery error about this column because I can't insert DATETIME typed datas into a TIMESTAMP typed column. (The output dataset schema is unset before the run of the Join recipe)

I've tried different things and it seems that Dataiku doesn't want to create a DATETIME typed column in BigQuery while creating the output dataset.

That's the first time I have this issue which seems to be a bug.

Where about to use a lot of DATETIME typed column in the future, so I can't just switch and use TIMESTAMP (it's not a long-term solution even if a short-term one)

If you need further information, I'll be pleased to give you more details about this.

Best regards,

Ugo


Operating system used: Windows 11

Tagged:

Best Answer

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 295 Dataiker
    edited July 17 Answer ✓

    Hi @UgoD
    ,

    DSS type system contains a subset of BigQuery type system. As a consequence, DSS will never write some data types. This includes:
    - DATE (Instead, DSS will generate TIMESTAMP columns)
    - DATETIME (Instead, DSS will generate TIMESTAMP columns)
    - NUMERIC (Instead, DSS will generate DOUBLE columns)
    - DECIMAL (Instead, DSS will generate DOUBLE columns)
    As a workaround, you can cast - in SQL recipes - the output columns to TIMESTAMP.
    For example:
    SELECT safe_cast(`cc_id_client` as bigint ) as id_client, TIMESTAMP(`dt_date`, 'UTC') as dt_date
      FROM `table`​
    This change is not required with other databases, only with BigQuery. The reason is that BigQuery does not automatically coerce DATE values into TIMESTAMP, while the other databases do.
    Thanks,
    Jordan

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 295 Dataiker
    Hi @UgoD
    ,
    In DSS 11.1.0, we changed the way BigQuery creates BigQuery tables for managed datasets to match what the other recipes do. DSS will now only create BigQuery tables with TIMESTAMP columns and no longer with DATE or DATETIME columns. This prevents issues in downstreams recipes where DSS is not capable of correctly reading back the DATE or DATETIME columns (i.e. without unexpected timeshifts).
    The workaround is to cast them to TIMESTAMP in the query.
    It's this line in the release notes
    * Fixed issue with date types coming from BigQuery
    Please let me know if you have any questions.
    Thanks!
    Jordan
  • UgoD
    UgoD Registered Posts: 9 ✭✭✭

    Hi @JordanB
    ,

    So just to be crystal clear, it's now impossible to work with BigQuery table if there is any DATETIME in it?

    What is the easiest way to convert a sql_table_dataset (connected to a BigQuery table with DATETIME typed column) to an usable dataset? Is it to use a prepare recipe?

    Best regards,

    Ugo

  • NahiaMontse
    NahiaMontse Registered Posts: 1

    Hello @JordanB
    ,

    I have a question: What happens if casting TIMESTAMP to DATE type, the field still leaves it as TIMESTAMP in BigQuery? Is there any other option that can be done from DataIku to prevent it from leaving the field as TIMESTAMP?

    Thanks in advance

Setup Info
    Tags
      Help me…