Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 5, 2025 9:39AM
Likes: 0
Replies: 1
Hello,
I am interested in understanding how to configure Spark settings to ensure optimal resource allocation. Specifically, I am looking for guidance on configuring parameters like spark.driver.cores
, spark.dynamicAllocation.initialExecutors
, spark.executor.cores
, spark.dynamicAllocation.enabled
, spark.executor.instances
, and spark.driver.memory
.
I would like to mention that I am working in a large enterprise where there is significant competition for using resources. Therefore, any advice on optimizing these configurations in such an environment would be highly valuable.
Any advice or insights you could share on these topics would be greatly appreciated!
Thank you in advance for your help.
Hi HAFEDH,
If you are working working in a large enterprise, I suppose your dataiku instance is managed by an IT service or specific infras service ? In this case, it's not your responsability to change these settings if you have not been infromed of their effectiveness. And if changing the params has an impact on performance, this means that you can affect the availability of the spark queue. It's up to you to decide whether you want to take the risk, given that you specify that the request to use spark jobs in your entreprise is significant and that, you have no knowledge of spark sessions.
Nonetheless, technical you can just try to create a pyspark recipe to script whatever you want to benchmark and try fit tuning a batch of differents config empirically. But the result will depend on your task's resource needs, as each type of job you want to optimize has different needs and performance depending on catching, shuffle, memory and parallelization .