Transfer a flow

ebbingcasa
ebbingcasa Registered Posts: 24 ✭✭✭✭✭

Hi there,

I'm still new to Dataiku, so I'd like to know how to plan ahead. Say, I want to use a flow for another country's data, what do I need to keep in mind when planning the flow? Is there an option to, for example, copy paste everything and then find/replace a country's name across all created datasets?

Thanks!

Best Answer

Answers

  • Mateusz
    Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭

    Hi @ebbingcasa

    I think it is heavily dependant on the project and what are you doing with the data in the end. For example, if you are working on some dashboard, and If the calculations/logic of the flow is the same and data source is the same I think you can basically use filters to add/remove countries, if not, indeed you can copy the flow or recipe by recipe... but I would suggest you to think of a way to keep everything in one flow if possible, especially if there is 1 data source (like SQL DB) for all the countries.

    Thanks

    Mateusz

  • ebbingcasa
    ebbingcasa Registered Posts: 24 ✭✭✭✭✭

    Hi Mateusz,

    thanks, makes perfect sense.

    In my case it all starts with several API calls. So what you're suggesting is that I should start it all with the API python receipt for all countries, then do all the data wrangling and filter for countries as late as possible to not have to copy steps for each country, correct?

    Best,
    Peter

  • Mateusz
    Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭

    I don't what to suggest the best solution, as I don't know the project

    But, If you know that data wrangling is the same for all countries (or huge part of it is the same) I would filter it in the first recipes even - to boost refresh / flow performance and then modify only specific recipes where you have some differences in data wrangling between countries

  • ebbingcasa
    ebbingcasa Registered Posts: 24 ✭✭✭✭✭

    Ok, so just because I'm more used to having all these steps sorted in scripts but I'd like to apply it more visually with Dataiku to make it better understandable for everybody else:

    Is there a way to apply the same flow multiple times to different input datasets to get different output datasets at the end of a flow? Something like:

    {input dataset 1st country | input dataset 2nd country | input dataset 3rd country} => apply same wrangling to each via e.g. prepare receipt(s) => {output dataset 1st country | output dataset 2nd country | output dataset 3rd country}

    I'm thinking that concatenating would worsen performance, whereas splitting/filtering leads to having to copy the same wrangling steps, if I'm not mistaken?

  • ebbingcasa
    ebbingcasa Registered Posts: 24 ✭✭✭✭✭

    That connected the dots to modules in python etc., thanks a lot!

Setup Info
    Tags
      Help me…