We've been having a good experience using Spark and containerized execution on our DSS platform. The next step would be to run Spark on Kubernetes, but we're facing some issues.
Things that work:
But executing Spark jobs themselves hangs on a repeating message:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.
The log seems to show that connectivity between DSS and the cluster work, the work is just not picked up. Have you perhaps experienced this and would you know how to fix this?
t3.micro may be insufficient as VM to run spark on the nodes. You should use larger instances, or try passing 500m for spark.kubernetes.executor.request.cores and spark.kubernetes.executor.limit.cores in the properties of the Spark config
You're right, when switching from t3.micro to t3.medium things work as expected! I'm sure that there's a lot of fun still to be had optimizing the spark settings.