Join recipe processing too slow

Options
manhnam91
manhnam91 Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 11 Partner

I am using join recipe to join two datasets with the number of records of each dataset about 100 million lines, I store these 2 datasets as CSV files, the join process takes about 3 hours. Is there a way to speed this up?


Operating system used: Ubuntu 18.04

Answers

  • Manuel
    Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
    Options

    Hi,

    Is the recipe running in the DSS engine (check below the Run button)? In what kind of machine is DSS installed on?

    If you have a Database connection, you can Sync both datasets to the database first and then run the recipe on the SQL engine.

    I hope this helps.

Setup Info
    Tags
      Help me…