Core Designer Certificate: ending up with too many rows after data Join
Hey all,
I am trying to Left-Join data for the Core Designer Certificate project but I keep ending up with too many rows. I want to use both Country and Year to do the join, since the combination is unique in all the input tables. But after joining, I keep ending up with three rows per Country-Year combination.
I have tried changing all kinds of options in the recipe, and I have also tried Fuzzy Join, but I always get duplicates. I have specifically tried just joining two out of the three data sets first but that did not help me figure out what is happening.
Thanks!
Answers
-
Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker
Hi @francisco
, in my own copy of the project, I have the same settings for the Join step so you should be OK there. I don't have a post-filter step that you seem to have. I do have a pre-filter step. That might be where to look next. -
Hi Sean,
Thanks for the reply. It ended up being that something had gone wrong in how one of the three input data sets had been read in by DSS; when I inspected it directly I could see that there were three rows per Country-Year combination. I started over from scratch and with the data loaded in correctly I was able to complete the project.
Thanks!
Francisco