Implementing SCD2 (slowly changing dimension) in Dataiku

Ankur30
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner

Hi,

I want to implement SCD type 2 in Dataiku. Can it be possible to implement using Dataiku visual recipes.

And what are the alternatives as well to implement the same.

Regards,

Ankur

Tagged:

Answers

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner

    Good morning

    Any updates on this ?

    Regards,

    Ankur.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Ankur30

    Most of the work I’ve done with DSS has been around taking transactional data and implementing models. In my source data SCD type 2 data is maintained. The approach I’ve used is to first understand the nature of this challenge. Then make sure when I join two different data sources, I know how the data is stored. If I can come up with the data in SCD type 2 form “as of” a particular time. In my case this means that I’m using the visual join recipes to join things like address to line items taking into account the date ranges that the particular address record was active, and the date of the line item. So this can be done with visual recipes. But as I have done this it is by in large manual, and has to do with the use of the join recipes.

    That said, Dataiku DSS can leverage a number of underlying data repositories like managed folders, snowflake, hdfs, S3 and the like. I’m not clear if any of these databases have “magic” that makes making this type of connection easier.

    Looking forward to hearing what others are thinking about this question.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    Here is a bit of a response from the snowflake folks about this question. https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-1 However, I don’t know how this approach will play with the Dataiku DSS built in visual recipes.

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner

    @tgb417
    Thanks for your response. With visual recipes it is difficult to implement the SCD2 in Dataiku unlike other etl tool like informatica (using lookup and update strategy) etc.

    Yes other solutions are correct where SCD2 can be implemented at Database level like snowflake, oracle or sql server etc.

    It is something Dataiku community needs to look as how we can leverage Dataiku to implement SCD2.

    Regards,

    Ankur

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Hi @Ankur30
    please feel free to utilize the Product Ideas board. The Product Ideas board is here to let you share and exchange your ideas on how to improve Dataiku. Here are some resources to help get you started: Suggest an idea

    I hope this helps!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    I've just started a project that has a slowly changing dimension component. @Ankur30
    have you made any further progress on this topic since we have last spoken?

    Has anyone else made progress?

    Here is a bit about my use case.

    https://community.dataiku.com/t5/Using-Dataiku/Maintaining-Sync-with-Slow-datasets-Adds-and-Modify/m-p/29079#M10902

Setup Info
    Tags
      Help me…