Move tables that are in a Data Center (SQL database) to a Google data warehouse

rafael_rosado97
Level 3
Move tables that are in a Data Center (SQL database) to a Google data warehouse

Hello everyone.
I want to move tables that are in a Data Center (SQL database) to a Google data warehouse.
I'm trying to find the most efficient way to do it through Dataiku (if possible).


I have 2 doubts:
1. Is it possible to list all tables (not necessarily DSS datasets) using a python recipe that are in a database through a connection configured from dataiku?
2. Is it better to make the connection through Python instead of the connection made by Dataiku to handle the data in the database?


The goal is to do it as automatically as possible, that's why I want to do it by Python code, maybe using SQLExecutor2.


One aspect to emphasize is that each table is updated daily, so I still want to know if it is advisable to use dataset partitions to only attach new data.

 

Can you recommend if it is a good idea?

 

Thank you very much!!!!


Operating system used: Linux

0 Kudos
1 Reply
Turribeach

@rafael_rosado97 wrote:

Hello everyone.
I want to move tables that are in a Data Center (SQL database) to a Google data warehouse.
I'm trying to find the most efficient way to do it through Dataiku (if possible).

Sync to Google Storage Bucket and then Sync to Google BigQuery


1. Is it possible to list all tables (not necessarily DSS datasets) using a python recipe that are in a database through a connection configured from dataiku?

Yes, use the metadata dictionary tables/views from your database. All databases have this. You can use a Dataiku SQL Notebook (Notebooks => New Notebook => Write your own => SQL) to develop the query. Then convert the notebook to a Sqlk Query Recipe and use the output dataset as your list of tables to migrate


2. Is it better to make the connection through Python instead of the connection made by Dataiku to handle the data in the database?

It really depends what you are trying to do. But if you want to do this programmatically then yes you will need to do it via Python. But not everything needs to be Python. For instance the query that gives you the tables to migrate can be a SQL Query dataset which you can use in a Python recipe. Then you will need to programmatically create a recipe to sync it to bucket dataset and then finally another sync recipe to a  BigQuery dataset.

0 Kudos