Core Designer Certificate: ending up with too many rows after data Join

Options
francisco
francisco Dataiku DSS Core Designer, Registered Posts: 2 ✭✭✭

Hey all,

I am trying to Left-Join data for the Core Designer Certificate project but I keep ending up with too many rows. I want to use both Country and Year to do the join, since the combination is unique in all the input tables. But after joining, I keep ending up with three rows per Country-Year combination.

I have tried changing all kinds of options in the recipe, and I have also tried Fuzzy Join, but I always get duplicates. I have specifically tried just joining two out of the three data sets first but that did not help me figure out what is happening.

Thanks!

Answers

  • Sean
    Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker
    Options

    Hi @francisco
    , in my own copy of the project, I have the same settings for the Join step so you should be OK there. I don't have a post-filter step that you seem to have. I do have a pre-filter step. That might be where to look next.

  • francisco
    francisco Dataiku DSS Core Designer, Registered Posts: 2 ✭✭✭
    Options

    Hi Sean,

    Thanks for the reply. It ended up being that something had gone wrong in how one of the three input data sets had been read in by DSS; when I inspected it directly I could see that there were three rows per Country-Year combination. I started over from scratch and with the data loaded in correctly I was able to complete the project.

    Thanks!

    Francisco

Setup Info
    Tags
      Help me…