Organize ETL process for Data Mart

Seymour93 Registered Posts: 9 ✭✭✭✭


We are investigating how to use Dataiku to replace other ETL tools (e.g. Talend, SSIS, Oracle Data Integrator).

Is there any suggestion / documentation / best practice describing how to use Dataiku for replacing one of those specialized tools for ETL?

In Dataiku one Project can be associated only to one Flow, whereas in other ETL specialized tools you can have a hierarchy like the following:

|_Folder 1
|_Mapping 1: ETL from Table A to B
|_Mapping 2: ETL from Table B to C
|_Mapping 3: ETL from Table B to D
|_Mapping 4: ETL from Table C, D and F to G

In this scenario, the different mappings are like separate Flow within the same Project.

I am not searching for a 1-to-1 replacement from other specialized ETL tools to Dataiku, instead I am trying to:

  1. shape my mind-set to deeply understand Dataiku
  2. make the best use of this tool by following the correct approaches designed by Dataiku


  • chrisk
    chrisk Registered Posts: 1 ✭✭✭✭

    One way is to use folders that can contain multiple flows. It is a great way to organize related flows and you can even visually see how those flows connect with the graph view.

    Another way is to use the flow filters and create different "views" of one flow. You can then share those "views." This is good if you want to have only one flow.

  • Seymour93
    Seymour93 Registered Posts: 9 ✭✭✭✭

    Thank you for the prompt reply.

    1. what do you mean by "use folders that can contain multiple flows"? Could you please share any resource on this?
    2. In the case I include all the "mappings" of my example into one single Flow, how can I organize them when deployed into production. For example, to run some portion of the Flow in parallel and other sequentially?
Setup Info
      Help me…