Google Big Query in DataIku

nv
nv Registered Posts: 11 ✭✭✭✭
Is there a feature to connect with Google Big Query like the amazon S3 connector in "cloud storage"?
Tagged:

Best Answer

Answers

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    Google BigQuery is not currently supported
  • jereze
    jereze Alpha Tester, Dataiker Alumni Posts: 190 ✭✭✭✭✭✭✭✭
  • NiteshK
    NiteshK Registered Posts: 3 ✭✭✭✭

    I want to write to Google Bigquery via the entries fed into webapp created in Dataiku.

    Is there some help available with that ?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @NiteshK

    I don't know which version of DSS you are using. However, it appears that Google Bigquery is natively supported by DSS only in the paid version. If you are using the community edition this feature is not directly available. (That said you might be able to use a Python, or R library to "roll your own".)

    --Tom

  • phb
    phb Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 8 ✭✭✭✭

    Tom, might one possibility for users of the community edition be to pull BigQuery data (for example, their Google Analytics dataset) into AWS, and then pull the data from AWS into Dataiku?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @phb

    Yes, that might be possible.

    What I was thinking of was using a python library like the one described here:

    https://googleapis.dev/python/bigquery/latest/index.html

    With python code somewhat like this.

    from google.cloud import bigquery client = bigquery.Client() # Perform a query. QUERY = ( 'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` ' 'WHERE state = "TX" ' 'LIMIT 100') query_job = client.query(QUERY) # API request rows = query_job.result() # Waits for query to finish for row in rows: print(row.name)

    Here is some further documentation https://cloud.google.com/bigquery/docs/reference/libraries#python

    ** I've not tested the above to connect directly to BigQuery.
  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker

    Hi,

    Writing into Bigquery, however, is a complicated topic. You cannot simply write one record after another. BigQuery is an analytical database designed for very-large-scale analytics workloads, not at all for online transaction processing (i.e. modifying records one by one).

    The only way to add data to BigQuery is to add said data to a "Google Cloud Storage" kind of dataset and then to sync this GCS dataset to BigQuery, which will use fast load capabilities of BigQuery.

    Writing to a GCS dataset is covered by the regular dataset write APIs of Dataiku.

  • NiteshK
    NiteshK Registered Posts: 3 ✭✭✭✭

    Thanks a lot @tgb417
    for your response !

    I have Version 7.0.2.

    Planning to create a 'Python Function' which has a python function to interact/manage BQ.

    Additionally it should be able to interact with an existing webapp and accept input json from this.

    Hope this is doable in dataiku ?

Setup Info
    Tags
      Help me…