Join processor: behavior when dataset bigger than RAM

Chiktika · September 2020

Hi,

In a prepare recipe, when I use the processor "Join with other dataset (memory-based)" what happened if the other dataset is bigger than RAM?

Does the join fail silently or does it raise an error?

Many thanks for your answer.

Liev · September 2020

@Chiktika

The datasets are not held entirely in memory (unless they're not very large) and instead the data is streamed in batches. This is done in order to avoid crashes when dealing with large datasets.

I hope this clarifies!

Edit: The above is true when you build a join recipe. For the join processor inside the prepare recipe, this is indeed done in memory. In this case, your limit is not the host memory, but the memory allocated under xmx, as outlined here.

Since this is the case, for prepare recipe joins we do not recommend unless the datasets involved are indeed on the smaller size.

Chiktika · September 2020

Many thanks @Liev
, it's wonderful

Liev · September 2020

@Chiktika
I've updated my answer above to clarify and correct.

Chiktika · September 2020

Thanks @Liev
for the update. Reading the backend section, I think I found my answer:

If the memory allocation is insufficient, the DSS backend may crash .... and running jobs/scenarios to be aborted.

Join processor: behavior when dataset bigger than RAM

Answers

Categories

Setup Info

Tags