General / Rule of Thumb Spark Configuration Settings

importthepandas · February 2023

We are using managed spark over kubernetes in EKS. We have about 80 active users on our design node, about 1/2 of them use spark regularly. We've tried to make things easy by creating simple spark configurations but are finding that we continuously are changing configurations. With multiple spark applications, has anyone usedspark.dynamicAllocation.enabled? Has anyone established good "boilerplate" configs for different spark workloads in their deployments?

Operating system used: ubuntu 18

Alexandru · February 2023

Hi @importthepandas
,

We typically suggest going with a few boiler plate configs and then modifying as needed based on your need and sometimes even on recipe basis rather then instance level config:

Common config could look like:

spark.sql.broadcastTimeout = 3600
spark.port.maxRetries = 200
spark.executor.extraJavaOptions=-Duser.timezone=GMT
spark.driver.extraJavaOptions=-Duser.timezone=GMT
spark.driver.host = 10.192.71.7
spark.network.timeout = 500s

Default:

spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 4
spark.executor.cores = 2
spark.sql.shuffle.partitions = 40

larger config :

spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80

High( for jobs that failed with previous config)

spark.executor.memory = 6g
spark.kubernetes.memoryOverheadFactor = 0.6
spark.driver.memory = 3g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80

On a per-job basis, you may want to touch only, as needed:
* spark.executor.instances
* spark.sql.shuffle.partitions
* spark.executor.memory / spark.kubernetes.memoryOverheadFactor where you have OOM errors

For other options like spark.dynamicAllocation.enabled is typically best left as the default false. From what I can see, this is still being worked on
https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work

importthepandas · March 2023

Thank you @AlexT
- this is great!

General / Rule of Thumb Spark Configuration Settings

Best Answer

Answers

Categories

Setup Info

Tags