Need help with efficient BigQuery to Snowflake data transfer in Dataiku

Zhikun
Zhikun Registered Posts: 1 ✭✭

Hi there!

I'm setting up a pipeline to move data from BigQuery to Snowflake and really need some advice from folks who've done this before.

Right now I'm using the standard DSS engine but it's painfully slow for larger datasets - takes forever to process even moderate amounts of data.

I'm not sure what route is actually the most efficient. Has anyone tackled something similar? What approach worked well for you?

Would really appreciate hearing about your experiences!

Thanks!

Operating system used: any

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,577 Neuron
    Answer ✓

    DSS engine means the data is batch streamed via the DSS server which means it will be slow. The fastest way to load data into BigQuery is via a GCS Bucket. You can use the Sync recipe to sync data to a GCS bucket in a BigQuery compatible format. You can also setup your BigQuery connection to use Automatic fast-write which requires the use of a GCS bucket to allow for fast bulk load into BigQuery transparently. But either way you should be using a GCS bucket.

    I am not familiar with the Snowflake side of things but the above solves 50% of your problem. Now you just need to figure out how to extract the data quickly into GCS.

Setup Info
    Tags
      Help me…