Google Big Query in DataIku
Best Answer
-
BigQuery is officially and fully supported since DSS 4.2 version
Answers
-
Google BigQuery is not currently supported
-
A connector is now available in DSS 3.1: https://doc.dataiku.com/dss/latest/connecting/sql/bigquery.html
-
I want to write to Google Bigquery via the entries fed into webapp created in Dataiku.
Is there some help available with that ?
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
I don't know which version of DSS you are using. However, it appears that Google Bigquery is natively supported by DSS only in the paid version. If you are using the community edition this feature is not directly available. (That said you might be able to use a Python, or R library to "roll your own".)
--Tom
-
phb Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 8 ✭✭✭✭
Tom, might one possibility for users of the community edition be to pull BigQuery data (for example, their Google Analytics dataset) into AWS, and then pull the data from AWS into Dataiku?
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Yes, that might be possible.
What I was thinking of was using a python library like the one described here:
https://googleapis.dev/python/bigquery/latest/index.html
With python code somewhat like this.
from google.cloud import bigquery client = bigquery.Client() # Perform a query. QUERY = ( 'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` ' 'WHERE state = "TX" ' 'LIMIT 100') query_job = client.query(QUERY) # API request rows = query_job.result() # Waits for query to finish for row in rows: print(row.name)
Here is some further documentation https://cloud.google.com/bigquery/docs/reference/libraries#python
** I've not tested the above to connect directly to BigQuery.
-
Hi,
Writing into Bigquery, however, is a complicated topic. You cannot simply write one record after another. BigQuery is an analytical database designed for very-large-scale analytics workloads, not at all for online transaction processing (i.e. modifying records one by one).
The only way to add data to BigQuery is to add said data to a "Google Cloud Storage" kind of dataset and then to sync this GCS dataset to BigQuery, which will use fast load capabilities of BigQuery.
Writing to a GCS dataset is covered by the regular dataset write APIs of Dataiku.
-
Thanks a lot @tgb417
for your response !
I have Version 7.0.2.
Planning to create a 'Python Function' which has a python function to interact/manage BQ.
Additionally it should be able to interact with an existing webapp and accept input json from this.
Hope this is doable in dataiku ?