Automatically reload schema from table

Options
ZMehdi
ZMehdi Registered Posts: 1
edited July 16 in Using Dataiku

Hi,

I created a python recipe that define a SQL query and run it on a BigQuery table through the Python Bigquery API. My recipe looks like this:

DataikuException: java.lang.ClassCastException: Cannot cast com.dataiku.dip.datasets.sql.ManagedSQLTableDatasetTestHandler to com.dataiku.dip.datasets.sql.ExternalSQLDatasetTestHandler

The query runs successfully, however the shema of the created dataset is empty:

Capture.PNG
When reloading manually shema, everything works fine. But I need to automate this reload and add it inside the recipe.

I have tried this solution: https://community.dataiku.com/t5/Using-Dataiku/reload-dataset-schema-programmatically/m-p/21410

But I get this error:

import dataiku
from bq_manager import run_bq_query

query = """
CREATE OR REPLACE TABLE `project.dataset.output_table` as
SELECT ...
"""
run_bq_query(query)

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 293 Dataiker
    Options

    Hi @ZMehdi
    ,

    autodetect_settings() is intended for "input" datasets, not for datasets that are in the middle of the Flow. For them (aka "managed datasets", the schema's "truth" is the dataset definition, not the table.

    What you may be looking for is schema propagation to propagate schema changes across the flow, from left to right (https://doc.dataiku.com/dss/latest/python-api/flow.html#schema-propagation)

    Also note that in DSS 9.0, we introduced both "reload schema of input dataset from table" and "propagate schema across Flow" as scenario steps, which allow you to automate that without code.

    Thanks,

    Jordan

Setup Info
    Tags
      Help me…