Core Designer Certificate: ending up with too many rows after data Join

francisco
francisco Dataiku DSS Core Designer, Registered Posts: 2 ✭✭✭

Hey all,

I am trying to Left-Join data for the Core Designer Certificate project but I keep ending up with too many rows. I want to use both Country and Year to do the join, since the combination is unique in all the input tables. But after joining, I keep ending up with three rows per Country-Year combination.

I have tried changing all kinds of options in the recipe, and I have also tried Fuzzy Join, but I always get duplicates. I have specifically tried just joining two out of the three data sets first but that did not help me figure out what is happening.

Thanks!

Answers

  • Sean
    Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker

    Hi @francisco
    , in my own copy of the project, I have the same settings for the Join step so you should be OK there. I don't have a post-filter step that you seem to have. I do have a pre-filter step. That might be where to look next.

  • francisco
    francisco Dataiku DSS Core Designer, Registered Posts: 2 ✭✭✭

    Hi Sean,

    Thanks for the reply. It ended up being that something had gone wrong in how one of the three input data sets had been read in by DSS; when I inspected it directly I could see that there were three rows per Country-Year combination. I started over from scratch and with the data loaded in correctly I was able to complete the project.

    Thanks!

    Francisco

Setup Info
    Tags
      Help me…