Spark on Kubernetes - Initial job has not accepted any resources
Hi,
We've been having a good experience using Spark and containerized execution on our DSS platform. The next step would be to run Spark on Kubernetes, but we're facing some issues.
Things that work:
- Building (Spark) base images and code-env specific images
- Pushing images to ECR
- Starting an EKS cluster (with the same subnet and security group as the DSS machine)
But executing Spark jobs themselves hangs on a repeating message:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.
The log seems to show that connectivity between DSS and the cluster work, the work is just not picked up. Have you perhaps experienced this and would you know how to fix this?
Regards,
Rik
Best Answer
-
t3.micro may be insufficient as VM to run spark on the nodes. You should use larger instances, or try passing 500m for spark.kubernetes.executor.request.cores and spark.kubernetes.executor.limit.cores in the properties of the Spark config
Answers
-
You're right, when switching from t3.micro to t3.medium things work as expected! I'm sure that there's a lot of fun still to be had optimizing the spark settings.