Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
We are using managed spark over kubernetes in EKS. We have about 80 active users on our design node, about 1/2 of them use spark regularly. We've tried to make things easy by creating simple spark configurations but are finding that we continuously are changing configurations. With multiple spark applications, has anyone usedspark.dynamicAllocation.enabled? Has anyone established good "boilerplate" configs for different spark workloads in their deployments?
Operating system used: ubuntu 18
Hi @importthepandas,
We typically suggest going with a few boiler plate configs and then modifying as needed based on your need and sometimes even on recipe basis rather then instance level config:
Common config could look like:
spark.sql.broadcastTimeout = 3600
spark.port.maxRetries = 200
spark.executor.extraJavaOptions=-Duser.timezone=GMT
spark.driver.extraJavaOptions=-Duser.timezone=GMT
spark.driver.host = 10.192.71.7
spark.network.timeout = 500s
Default:
spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 4
spark.executor.cores = 2
spark.sql.shuffle.partitions = 40
larger config :
spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80
High( for jobs that failed with previous config)
spark.executor.memory = 6g
spark.kubernetes.memoryOverheadFactor = 0.6
spark.driver.memory = 3g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80
On a per-job basis, you may want to touch only, as needed:
* spark.executor.instances
* spark.sql.shuffle.partitions
* spark.executor.memory / spark.kubernetes.memoryOverheadFactor where you have OOM errors
For other options like spark.dynamicAllocation.enabled is typically best left as the default false. From what I can see, this is still being worked on
https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work
Hi @importthepandas,
We typically suggest going with a few boiler plate configs and then modifying as needed based on your need and sometimes even on recipe basis rather then instance level config:
Common config could look like:
spark.sql.broadcastTimeout = 3600
spark.port.maxRetries = 200
spark.executor.extraJavaOptions=-Duser.timezone=GMT
spark.driver.extraJavaOptions=-Duser.timezone=GMT
spark.driver.host = 10.192.71.7
spark.network.timeout = 500s
Default:
spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 4
spark.executor.cores = 2
spark.sql.shuffle.partitions = 40
larger config :
spark.executor.memory = 4g
spark.kubernetes.memoryOverheadFactor = 0.4
spark.driver.memory = 2g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80
High( for jobs that failed with previous config)
spark.executor.memory = 6g
spark.kubernetes.memoryOverheadFactor = 0.6
spark.driver.memory = 3g
spark.executor.instances = 8
spark.executor.cores = 2
spark.sql.shuffle.partitions = 80
On a per-job basis, you may want to touch only, as needed:
* spark.executor.instances
* spark.sql.shuffle.partitions
* spark.executor.memory / spark.kubernetes.memoryOverheadFactor where you have OOM errors
For other options like spark.dynamicAllocation.enabled is typically best left as the default false. From what I can see, this is still being worked on
https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work
Thank you @AlexT - this is great!