Community Conundrum 25: Feature Visualization is now live! Read More

Google Big Query in DataIku

Level 2
Google Big Query in DataIku
Is there a feature to connect with Google Big Query like the amazon S3 connector in "cloud storage"?
0 Kudos
9 Replies
Dataiker
Dataiker
Google BigQuery is not currently supported
0 Kudos
Dataiker
Dataiker
A connector is now available in DSS 3.1: https://doc.dataiku.com/dss/latest/connecting/sql/bigquery.html
Jeremy, Product Manager at Dataiku
Dataiker Alumni
BigQuery is officially and fully supported since DSS 4.2 version
Level 1

I want to write to Google Bigquery via the entries fed into webapp created in Dataiku.

Is there some help available with that ?

0 Kudos
Level 6

@NiteshK 

I don't know which version of DSS you are using.  However, it appears that Google Bigquery is natively supported by DSS only in the paid version.  If you are using the community edition this feature is not directly available.  (That said you might be able to use a Python, or R library to "roll your own".)

--Tom

--Tom
0 Kudos
Level 2

Tom, might one possibility for users of the community edition be to pull BigQuery data (for example, their Google Analytics dataset) into AWS, and then pull the data from AWS into Dataiku?

0 Kudos
Level 6

@phb 

Yes, that might be possible.  

What I was thinking of was using a python library like the one described here:

https://googleapis.dev/python/bigquery/latest/index.html

With python code somewhat like this.

from google.cloud import bigquery client = bigquery.Client() # Perform a query. QUERY = ( 'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` ' 'WHERE state = "TX" ' 'LIMIT 100') query_job = client.query(QUERY) # API request rows = query_job.result() # Waits for query to finish for row in rows: print(row.name)

Here is some further documentation https://cloud.google.com/bigquery/docs/reference/libraries#python

 
 ** I've not tested the above to connect directly to BigQuery.
--Tom
0 Kudos
Level 1

Thanks a lot @tgb417 for your response !

I have Version 7.0.2.

Planning to create a 'Python Function' which has a python function to interact/manage BQ.

Additionally it should be able to interact with an existing webapp and accept input json from this.

Hope this is doable in dataiku ?

0 Kudos

Hi,

Writing into Bigquery, however, is a complicated topic. You cannot simply write one record after another. BigQuery is an analytical database designed for very-large-scale analytics workloads, not at all for online transaction processing (i.e. modifying records one by one).

The only way to add data to BigQuery is to add said data to a "Google Cloud Storage" kind of dataset and then to sync this GCS dataset to BigQuery, which will use fast load capabilities of BigQuery.

Writing to a GCS dataset is covered by the regular dataset write APIs of Dataiku.

Labels (2)