Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I want to implement SCD type 2 in Dataiku. Can it be possible to implement using Dataiku visual recipes.
And what are the alternatives as well to implement the same.
Most of the work I’ve done with DSS has been around taking transactional data and implementing models. In my source data SCD type 2 data is maintained. The approach I’ve used is to first understand the nature of this challenge. Then make sure when I join two different data sources, I know how the data is stored. If I can come up with the data in SCD type 2 form “as of” a particular time. In my case this means that I’m using the visual join recipes to join things like address to line items taking into account the date ranges that the particular address record was active, and the date of the line item. So this can be done with visual recipes. But as I have done this it is by in large manual, and has to do with the use of the join recipes.
That said, Dataiku DSS can leverage a number of underlying data repositories like managed folders, snowflake, hdfs, S3 and the like. I’m not clear if any of these databases have “magic” that makes making this type of connection easier.
Looking forward to hearing what others are thinking about this question.
Here is a bit of a response from the snowflake folks about this question. https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-U... However, I don’t know how this approach will play with the Dataiku DSS built in visual recipes.
@tgb417 Thanks for your response. With visual recipes it is difficult to implement the SCD2 in Dataiku unlike other etl tool like informatica (using lookup and update strategy) etc.
Yes other solutions are correct where SCD2 can be implemented at Database level like snowflake, oracle or sql server etc.
It is something Dataiku community needs to look as how we can leverage Dataiku to implement SCD2.
Hi @Ankur30 please feel free to utilize the Product Ideas board. The Product Ideas board is here to let you share and exchange your ideas on how to improve Dataiku. Here are some resources to help get you started:
How to suggest Dataiku ideas
Participating on the Product Ideas board
Suggest an idea
I hope this helps!
I've just started a project that has a slowly changing dimension component. @Ankur30 have you made any further progress on this topic since we have last spoken?
Has anyone else made progress?
Here is a bit about my use case.