Join processor: behavior when dataset bigger than RAM

Chiktika
Chiktika Registered Posts: 24 ✭✭✭✭

Hi,

In a prepare recipe, when I use the processor "Join with other dataset (memory-based)" what happened if the other dataset is bigger than RAM?

Does the join fail silently or does it raise an error?

Many thanks for your answer.

Answers

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭

    @Chiktika

    The datasets are not held entirely in memory (unless they're not very large) and instead the data is streamed in batches. This is done in order to avoid crashes when dealing with large datasets.

    I hope this clarifies!

    Edit: The above is true when you build a join recipe. For the join processor inside the prepare recipe, this is indeed done in memory. In this case, your limit is not the host memory, but the memory allocated under xmx, as outlined here.

    Since this is the case, for prepare recipe joins we do not recommend unless the datasets involved are indeed on the smaller size.

  • Chiktika
    Chiktika Registered Posts: 24 ✭✭✭✭

    Many thanks @Liev
    , it's wonderful

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭

    @Chiktika
    I've updated my answer above to clarify and correct.

  • Chiktika
    Chiktika Registered Posts: 24 ✭✭✭✭

    Thanks @Liev
    for the update. Reading the backend section, I think I found my answer:


    If the memory allocation is insufficient, the DSS backend may crash .... and running jobs/scenarios to be aborted.


Setup Info
    Tags
      Help me…