We use Google BigQuery a lot, it's where all our data begins. We are seeing some real slowness when it comes to reading data from BigQuery into DSS, as a kind of work around we pass data to Google Cloud Storage and then import this into DSS, which is much faster.
However, we still need to read from BigQuery to do this initial load into GCS, and it's really slow. Here is an example, reading 1 million rows in batches of 2k:
By contrast, the batch size and speed of import from GCS is much quicker, here is the same 1 million rows coming from GCS:
Is there any way to increase this batch size, or speed up imports from BigQuery?
If you use the Sync recipe to perform the data copy from BigQuery to GCS, an optimized "BigQuery to GCS" engine will be used. Are you using a Sync recipe or another kind of recipe? Do you see the "BigQuery to GCS" engine selected ? If not, you can click on the engine selection widget in the Sync recipe to get more details about why it was not selected.
Thanks for your response - I was using the prepare recipe, not sync. However, have I test out Sync I see the same level of performance:
Looking into the engine I see:
My BigQuery dataset is built on a SQL query, I assume this relates to this? Is there any way around it? Otherwise I will have to create this table in BQ as an additional step, which is not preferred.
The BigQuery dataset in DSS must indeed be in "table" mode, however the "table" mode in DSS can also read views, so the simplest would be to create a view in BigQuery (this way, no data copy is required)