You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

Core Designer Certificate: ending up with too many rows after data Join

francisco
Level 1
Core Designer Certificate: ending up with too many rows after data Join
 
 

Hey all,

I am trying to Left-Join data for the Core Designer Certificate project but I keep ending up with too many rows. I want to use both Country and Year to do the join, since the combination is unique in all the input tables. But after joining, I keep ending up with three rows per Country-Year combination. 

I have tried changing all kinds of options in the recipe, and I have also tried Fuzzy Join, but I always get duplicates. I have specifically tried just joining two out of the three data sets first but that did not help me figure out what is happening.

Thanks!

0 Kudos
2 Replies
SeanA
Community Manager
Community Manager

Hi @francisco , in my own copy of the project, I have the same settings for the Join step so you should be OK there. I don't have a post-filter step that you seem to have. I do have a pre-filter step. That might be where to look next.

Dataiku
0 Kudos
francisco
Level 1
Author

Hi Sean,

Thanks for the reply. It ended up being that something had gone wrong in how one of the three input data sets had been read in by DSS; when I inspected it directly I could see that there were three rows per Country-Year combination. I started over from scratch and with the data loaded in correctly I was able to complete the project.

Thanks!

Francisco