AKS - Spark Job Error: java.io.IOException: kubectl failed with code=1 out= err=error: resource name
I have created a managed AKS cluster and setup the Spark Configuration and enable Managed Spark on K8.
When I try to run a recipe with this Spark Configuration I get the following errors:
[2023/04/10-00:13:08.897] [ActivityExecutor-30] [ERROR] [dku.flow.activity] running compute_Grouped_dataset_NP - Activity failed java.io.IOException: kubectl failed with code=1 out= err=error: resource name may not be empty at com.dataiku.dip.containers.exec.KubectlHelper.executeKubectl(KubectlHelper.java:36) at com.dataiku.dip.containers.exec.KubectlHelper.executeKubectlToJSON(KubectlHelper.java:66) at com.dataiku.dip.spark.submit.SparkKubernetesSubmitHelper.createAdditionalContextElement(SparkKubernetesSubmitHelper.java:364) at com.dataiku.dip.spark.submit.SparkKubernetesSubmitHelper.createAdditionalContextElement(SparkKubernetesSubmitHelper.java:61) at com.dataiku.dip.spark.yarnaware.SparkYarnAwareJobHelper.setupRunUsingClient(SparkYarnAwareJobHelper.java:517) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:200) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82) at com.dataiku.dip.recipes.code.sparksql.SparkSQLQueryRecipeRunnerBase.executeJobDef(SparkSQLQueryRecipeRunnerBase.java:37) at com.dataiku.dip.recipes.code.sparksql.SparkSQLExecutor.run(SparkSQLExecutor.java:44) at com.dataiku.dip.dataflow.exec.MultiEngineRecipeRunner.run(MultiEngineRecipeRunner.java:203) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
Does this mean there are no resources available in my dnss-admin kubernetes namespace?
How can I go around this?
Thanks!
Operating system used: Centos 7
Answers
-
Hello
This error, in this place, happens sometime on K8S experiencing heavy loads. The workaround it to retry the job (until an automatic retry of the call that fails will be added (TBD)).
-
Thank you Dan,
I've retried the job multiple times and still get the same error.
Is the root cause Kubernetes not finding ressources? WOuld there be another workaround?
Thank you! -
I have the same issue after recent upgrade of EKS from 1.23 to version 1.25. Any resolution steps or commands that I can try on the backend of DSS to check if everything is okay or not ?