Implementing SCD2 (slowly changing dimension) in Dataiku
Hi,
I want to implement SCD type 2 in Dataiku. Can it be possible to implement using Dataiku visual recipes.
And what are the alternatives as well to implement the same.
Regards,
Ankur
Answers
-
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
Good morning
Any updates on this ?
Regards,
Ankur.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Most of the work I’ve done with DSS has been around taking transactional data and implementing models. In my source data SCD type 2 data is maintained. The approach I’ve used is to first understand the nature of this challenge. Then make sure when I join two different data sources, I know how the data is stored. If I can come up with the data in SCD type 2 form “as of” a particular time. In my case this means that I’m using the visual join recipes to join things like address to line items taking into account the date ranges that the particular address record was active, and the date of the line item. So this can be done with visual recipes. But as I have done this it is by in large manual, and has to do with the use of the join recipes.
That said, Dataiku DSS can leverage a number of underlying data repositories like managed folders, snowflake, hdfs, S3 and the like. I’m not clear if any of these databases have “magic” that makes making this type of connection easier.
Looking forward to hearing what others are thinking about this question.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Here is a bit of a response from the snowflake folks about this question. https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-1 However, I don’t know how this approach will play with the Dataiku DSS built in visual recipes.
-
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
@tgb417
Thanks for your response. With visual recipes it is difficult to implement the SCD2 in Dataiku unlike other etl tool like informatica (using lookup and update strategy) etc.Yes other solutions are correct where SCD2 can be implemented at Database level like snowflake, oracle or sql server etc.
It is something Dataiku community needs to look as how we can leverage Dataiku to implement SCD2.
Regards,
Ankur
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi @Ankur30
please feel free to utilize the Product Ideas board. The Product Ideas board is here to let you share and exchange your ideas on how to improve Dataiku. Here are some resources to help get you started:Suggest an idea I hope this helps!
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
I've just started a project that has a slowly changing dimension component. @Ankur30
have you made any further progress on this topic since we have last spoken?Has anyone else made progress?
Here is a bit about my use case.