Join processor: behavior when dataset bigger than RAM
Hi,
In a prepare recipe, when I use the processor "Join with other dataset (memory-based)" what happened if the other dataset is bigger than RAM?
Does the join fail silently or does it raise an error?
Many thanks for your answer.
Answers
-
The datasets are not held entirely in memory (unless they're not very large) and instead the data is streamed in batches. This is done in order to avoid crashes when dealing with large datasets.
I hope this clarifies!
Edit: The above is true when you build a join recipe. For the join processor inside the prepare recipe, this is indeed done in memory. In this case, your limit is not the host memory, but the memory allocated under xmx, as outlined here.
Since this is the case, for prepare recipe joins we do not recommend unless the datasets involved are indeed on the smaller size.
-
Many thanks @Liev
, it's wonderful -
@Chiktika
I've updated my answer above to clarify and correct. -
Thanks @Liev
for the update. Reading the backend section, I think I found my answer:If the memory allocation is insufficient, the DSS backend may crash .... and running jobs/scenarios to be aborted.