AKS - Spark Job Error: java.io.IOException: kubectl failed with code=1 out= err=error: resource name

daniel10012
Level 1
AKS - Spark Job Error: java.io.IOException: kubectl failed with code=1 out= err=error: resource name

I have created a managed AKS cluster and setup the Spark Configuration and enable Managed Spark on K8.

When I try to run a recipe with this Spark Configuration I get the following errors:

[2023/04/10-00:13:08.897] [ActivityExecutor-30] [ERROR] [dku.flow.activity] running compute_Grouped_dataset_NP - Activity failed
java.io.IOException: kubectl failed with code=1 out= err=error: resource name may not be empty

	at com.dataiku.dip.containers.exec.KubectlHelper.executeKubectl(KubectlHelper.java:36)
	at com.dataiku.dip.containers.exec.KubectlHelper.executeKubectlToJSON(KubectlHelper.java:66)
	at com.dataiku.dip.spark.submit.SparkKubernetesSubmitHelper.createAdditionalContextElement(SparkKubernetesSubmitHelper.java:364)
	at com.dataiku.dip.spark.submit.SparkKubernetesSubmitHelper.createAdditionalContextElement(SparkKubernetesSubmitHelper.java:61)
	at com.dataiku.dip.spark.yarnaware.SparkYarnAwareJobHelper.setupRunUsingClient(SparkYarnAwareJobHelper.java:517)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:200)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82)
	at com.dataiku.dip.recipes.code.sparksql.SparkSQLQueryRecipeRunnerBase.executeJobDef(SparkSQLQueryRecipeRunnerBase.java:37)
	at com.dataiku.dip.recipes.code.sparksql.SparkSQLExecutor.run(SparkSQLExecutor.java:44)
	at com.dataiku.dip.dataflow.exec.MultiEngineRecipeRunner.run(MultiEngineRecipeRunner.java:203)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)


Does this mean there are no resources available in my dnss-admin kubernetes namespace?
How can I go around this?
Thanks!


Operating system used: Centos 7

0 Kudos
3 Replies
DanDy
Dataiker

Hello

This error, in this place, happens sometime on K8S experiencing heavy loads. The workaround it to retry the job (until  an automatic retry of the call that fails will be added (TBD)).

 

0 Kudos
daniel10012
Level 1
Author

Thank you Dan,

I've retried the job multiple times and still get the same error.
Is the root cause Kubernetes not finding ressources? WOuld there be another workaround?

Thank you!

0 Kudos
jacksonisaac
Level 1

I have the same issue after recent upgrade of EKS from 1.23 to version 1.25. Any resolution steps or commands that I can try on the backend of DSS to check if everything is okay or not ?

0 Kudos