Implementing SCD2 (slowly changing dimension) in Dataiku

Ankur30
Level 3
Implementing SCD2 (slowly changing dimension) in Dataiku

Hi,

 

I want to implement SCD type 2 in Dataiku. Can it be possible to implement using Dataiku visual recipes.

And what are the alternatives as well to implement the same.

 

Regards,

Ankur

0 Kudos
6 Replies
Ankur30
Level 3
Author

 

 

Good morning

 

Any updates on this ?

Regards,

Ankur.

0 Kudos
tgb417

@Ankur30 

Most of the work Iโ€™ve done with DSS has been around taking transactional data and implementing models.  In my source data SCD type 2 data is maintained.  The approach Iโ€™ve used is to first understand the nature of this challenge.  Then make sure when I join two different data sources, I know how the data is stored. If I can come up with the data in SCD type 2 form โ€œas ofโ€ a particular time.  In my case this means that Iโ€™m using the visual join recipes to join things like address to line items taking into account the date ranges that the particular address record was active, and the date of the line item.  So this can be done with visual recipes.  But as I have done this it is by in large manual, and has to do with the use of the join recipes.  

That said, Dataiku DSS can leverage a number of underlying data repositories like managed folders, snowflake, hdfs, S3 and the like.  Iโ€™m not clear if any of these databases have โ€œmagicโ€ that makes making this type of connection easier.  

Looking forward to hearing what others are thinking about this question.

--Tom
0 Kudos
tgb417

Here is a bit of a response from the snowflake folks about this question. https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-U... However, I donโ€™t know how this approach will play with the Dataiku DSS built in visual recipes.  

--Tom
0 Kudos
Ankur30
Level 3
Author

@tgb417 Thanks for your response. With visual recipes it is difficult to implement the SCD2 in Dataiku unlike other etl tool like informatica (using lookup and update strategy) etc.

Yes other solutions are correct where SCD2 can be implemented at Database level like snowflake, oracle or sql server etc.

It is something Dataiku community needs to look as how we can leverage Dataiku to implement SCD2.

Regards,

Ankur

 

 
 
0 Kudos
CoreyS
Dataiker Alumni

Hi @Ankur30  please feel free to utilize the Product Ideas board. The Product Ideas board is here to let you share and exchange your ideas on how to improve Dataiku. Here are some resources to help get you started:

How to suggest Dataiku ideas 
Participating on the Product Ideas board 
Suggest an idea

 

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
tgb417

I've just started a project that has a slowly changing dimension component.  @Ankur30 have you made any further progress on this topic since we have last spoken?

Has anyone else made progress?

Here is a bit about my use case.

https://community.dataiku.com/t5/Using-Dataiku/Maintaining-Sync-with-Slow-datasets-Adds-and-Modify/m...

 

--Tom
0 Kudos