General / Rule of Thumb Spark Configuration Settings
We are using managed spark over kubernetes in EKS. We have about 80 active users on our design node, about 1/2 of them use spark regularly. We've tried to make things easy by creating simple spark configurations but are finding that we continuously are changing configurations. With multiple spark applications, has anyone usedspark.dynamicAllocation.enabled? Has anyone established good "boilerplate" configs for different spark workloads in their deployments?
Operating system used: ubuntu 18
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @importthepandas
,
We typically suggest going with a few boiler plate configs and then modifying as needed based on your need and sometimes even on recipe basis rather then instance level config:Common config could look like:
spark.sql.broadcastTimeout = 3600
spark.port.maxRetries = 200
spark.executor.extraJavaOptions=-Duser.timezone=GMT
spark.driver.extraJavaOptions=-Duser.timezone=GMT
spark.driver.host = 10.192.71.7
spark.network.timeout = 500sDefault:
spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 4
spark.executor.cores = 2
spark.sql.shuffle.partitions = 40larger config :
spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80
High( for jobs that failed with previous config)spark.executor.memory = 6g
spark.kubernetes.memoryOverheadFactor = 0.6
spark.driver.memory = 3g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80On a per-job basis, you may want to touch only, as needed:
* spark.executor.instances
* spark.sql.shuffle.partitions
* spark.executor.memory / spark.kubernetes.memoryOverheadFactor where you have OOM errors
For other options like spark.dynamicAllocation.enabled is typically best left as the default false. From what I can see, this is still being worked on
https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work
Answers
-
importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
Thank you @AlexT
- this is great!