Join recipe processing too slow

Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 11 Partner

I am using join recipe to join two datasets with the number of records of each dataset about 100 million lines, I store these 2 datasets as CSV files, the join process takes about 3 hours. Is there a way to speed this up?


Operating system used: Ubuntu 18.04

Answers

  • Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭

    Hi,

    Is the recipe running in the DSS engine (check below the Run button)? In what kind of machine is DSS installed on?

    If you have a Database connection, you can Sync both datasets to the database first and then run the recipe on the SQL engine.

    I hope this helps.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.