Organize ETL process for Data Mart

Seymour93
Level 2
Organize ETL process for Data Mart

Hi,

Context
We are investigating how to use Dataiku to replace other ETL tools (e.g. Talend, SSIS, Oracle Data Integrator).

Question
Is there any suggestion / documentation / best practice describing how to use Dataiku for replacing one of those specialized tools for ETL?

Example
In Dataiku one Project can be associated only to one Flow, whereas in other ETL specialized tools you can have a hierarchy like the following:

Project
|_Folder 1
     |_Mapping 1: ETL from Table A to B
     |_Mapping 2: ETL from Table B to C
     |_Mapping 3: ETL from Table B to D
     |_Mapping 4: ETL from Table C, D and F to G

In this scenario, the different mappings are like separate Flow within the same Project. 

Notes
I am not searching for a 1-to-1 replacement from other specialized ETL tools to Dataiku, instead I am trying to:

  1. shape my mind-set to deeply understand Dataiku
  2. make the best use of this tool by following the correct approaches designed by Dataiku
0 Kudos
2 Replies
chrisk
Level 1

One way is to use folders that can contain multiple flows.  It is a great way to organize related flows and you can even visually see how those flows connect with the graph view. 

 

Another way is to use the flow filters and create different "views" of one flow. You can then share those "views." This is good if you want to have only one flow. 

Seymour93
Level 2
Author

Thank you for the prompt reply.

  1. what do you mean by "use folders that can contain multiple flows"? Could you please share any resource on this?
  2. In the case I include all the "mappings" of my example into one single Flow, how can I organize them when deployed into production. For example, to run some portion of the Flow in parallel and other sequentially?
0 Kudos