Automating BigQuery Dataset Creation via Python Notebook

Guillaume5
Guillaume5 Registered Posts: 8

Hello everyone,

I am currently working on a project where I want to automate the transfer of data from a database to BigQuery using a Python notebook in Dataiku. I have many tables involved and I aim to automate the entire process because I don't want to create each dataset manually.

I found information on dataset creation in Dataiku here: Documentation Dataiku - Dataset Creation. However, I am facing difficulties in defining the BigQuery dataset and the table name where the data should be placed.

If anyone has done something similar or has advice on how to specify the dataset and table name in the code, it would be incredibly helpful. Any code examples or additional references would also be greatly appreciated!

Thank you in advance for your help,

Guillaume

import dataiku

client = dataiku.api_client()
project_key = dataiku.default_project_key()
project = client.get_project(project_key)

#create the dataset
builder = project.new_managed_dataset("test1")
builder.with_store_into(connection= "XXX")
dataset = builder.create(overwrite=True)
image.png

Operating system used: Windows 10

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,501 Neuron
    Answer ✓

    Here is some sample code changing some Dataset settings:

    client = dataiku.api_client()
    project = client.get_project("project")
    dataset = project.get_dataset("dataset")
    dataset_settings = dataset.get_settings()
    dataset_settings.get_raw()['flowOptions']['rebuildBehavior']='WRITE_PROTECT'
    dataset_settings.save()
    

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,501 Neuron

    I would imagine you set these using get_creation_settings() and get_settings() . The easiest way to see how this should be set is to create a new managed dataset manually, then inspect the settings via the API and replicate it for your new datasets created programmatically.

  • Guillaume5
    Guillaume5 Registered Posts: 8

    Hello,

    Thanks for your suggestion. I attempted to follow your advice by inspecting the settings of a manually created dataset via the API. However, I encountered an error using get_settings() method:

    import dataiku
    mydataset = dataiku.Dataset("TEST_GJA")
    df = mydataset.get_dataframe()
    mydataset.get_settings() ---------------------------------------------------------------------------
    AttributeError Traceback (most recent call last)
    Cell In[35], line 1
    ----> 1 mydataset.get_settings()

    AttributeError: 'Dataset' object has no attribute 'get_settings'

    I manually created a dataset as suggested but am struggling to retrieve its settings programmatically.

    Thx

    Guillaume

  • Guillaume5
    Guillaume5 Registered Posts: 8

    Hello,

    Thanks to the guidance I received earlier, I've managed to access the settings of a dataset using the Dataiku API. However, I'm currently facing challenges in defining the 'schema' parameter for an existing dataset. Here’s the code I’ve been using:

    import dataiku
    from dataiku import api_client

    client = api_client()

    project = client.get_project("PYTHONSANDBOX")

    dataset = project.get_dataset("TEST_GJA")

    settings = dataset.get_settings()

    I understand that modifying dataset schemas generally involves using the settings object. However, I'm unsure how to set the 'schema' property correctly for an existing dataset. If anyone has experience or examples of how to update the schema via the API, your assistance would be greatly valued.

    Thank you in advance for your help!

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,501 Neuron
  • Guillaume5
    Guillaume5 Registered Posts: 8

    Thanks so much! Your code worked perfectly for setting the parameters, including the schema. I really appreciate your help!

Setup Info
    Tags
      Help me…