Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
spark tasks in a single spark job,
if I run single spark recipe it runs as per the rule of block size i.e creating one task for 128 mb of block.
but if i run same spark job with spark pipeline it runs only 8/9 tasks (not more than this) no matter how big the cluster i choose, this information is noted from spark ui (we have 20 nodes of cluster but spark pipeline uses only 2 nodes meanwhile if we run same job without pipeline it uses whole cluster)
spark pipeline (spark ui) image
Spark single recipe :
As seen from above images, while running recipes in spark pipeline it runs only 8/9 tasks while in normal spark recipe it uses whole cluster according to data size and block size