Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Join processor: behavior when dataset bigger than RAM

Level 2
Join processor: behavior when dataset bigger than RAM

Hi,

In a prepare recipe, when I use the processor "Join with other dataset (memory-based)" what happened if the other dataset is bigger than RAM?

Does the join fail silently or does it raise an error?

Many thanks for your answer.

 

0 Kudos
4 Replies
Dataiker
Dataiker

@Chiktika 

The datasets are not held entirely in memory (unless they're not very large) and instead the data is streamed in batches. This is done in order to avoid crashes when dealing with large datasets.

I hope this clarifies!

Edit: The above is true when you build a join recipe. For the join processor inside the prepare recipe, this is indeed done in memory. In this case, your limit is not the host memory, but the memory allocated under xmx, as outlined here.

Since this is the case, for prepare recipe joins we do not recommend unless the datasets involved are indeed on the smaller size.

0 Kudos
Level 2
Author

Many thanks @Liev , it's wonderful 

0 Kudos
Dataiker
Dataiker

@Chiktika I've updated my answer above to clarify and correct.

 

Level 2
Author

Thanks @Liev for the update. Reading the backend section, I think I found my answer:


If the memory allocation is insufficient, the DSS backend may crash .... and running jobs/scenarios to be aborted.


 

0 Kudos