Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Join recipe processing too slow

manhnam91
Level 2
Level 2
Join recipe processing too slow

I am using join recipe to join two datasets with the number of records of each dataset about 100 million lines, I store these 2 datasets as CSV files, the join process takes about 3 hours. Is there a way to speed this up?


Operating system used: Ubuntu 18.04

0 Kudos
1 Reply
Manuel
Dataiker
Dataiker

Hi,

Is the recipe running in the DSS engine (check below the Run button)? In what kind of machine is DSS installed on?

If you have a Database connection, you can Sync both datasets to the database first and then run the recipe on the SQL engine.

I hope this helps.