AKS - Spark Job Error: java.io.IOException: kubectl failed with code=1 out= err=error: resource name

Options
daniel10012
daniel10012 Dataiku DSS Core Designer, Registered Posts: 2
edited July 16 in Setup & Configuration

I have created a managed AKS cluster and setup the Spark Configuration and enable Managed Spark on K8.

When I try to run a recipe with this Spark Configuration I get the following errors:

[2023/04/10-00:13:08.897] [ActivityExecutor-30] [ERROR] [dku.flow.activity] running compute_Grouped_dataset_NP - Activity failed
java.io.IOException: kubectl failed with code=1 out= err=error: resource name may not be empty

    at com.dataiku.dip.containers.exec.KubectlHelper.executeKubectl(KubectlHelper.java:36)
    at com.dataiku.dip.containers.exec.KubectlHelper.executeKubectlToJSON(KubectlHelper.java:66)
    at com.dataiku.dip.spark.submit.SparkKubernetesSubmitHelper.createAdditionalContextElement(SparkKubernetesSubmitHelper.java:364)
    at com.dataiku.dip.spark.submit.SparkKubernetesSubmitHelper.createAdditionalContextElement(SparkKubernetesSubmitHelper.java:61)
    at com.dataiku.dip.spark.yarnaware.SparkYarnAwareJobHelper.setupRunUsingClient(SparkYarnAwareJobHelper.java:517)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:200)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82)
    at com.dataiku.dip.recipes.code.sparksql.SparkSQLQueryRecipeRunnerBase.executeJobDef(SparkSQLQueryRecipeRunnerBase.java:37)
    at com.dataiku.dip.recipes.code.sparksql.SparkSQLExecutor.run(SparkSQLExecutor.java:44)
    at com.dataiku.dip.dataflow.exec.MultiEngineRecipeRunner.run(MultiEngineRecipeRunner.java:203)
    at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)


Does this mean there are no resources available in my dnss-admin kubernetes namespace?
How can I go around this?
Thanks!


Operating system used: Centos 7

Answers

  • DanDy
    DanDy Dataiker, Dataiku DSS Core Designer, Registered Posts: 8 Dataiker
    Options

    Hello

    This error, in this place, happens sometime on K8S experiencing heavy loads. The workaround it to retry the job (until an automatic retry of the call that fails will be added (TBD)).

  • daniel10012
    daniel10012 Dataiku DSS Core Designer, Registered Posts: 2
    Options

    Thank you Dan,

    I've retried the job multiple times and still get the same error.
    Is the root cause Kubernetes not finding ressources? WOuld there be another workaround?

    Thank you!

  • jacksonisaac
    jacksonisaac Registered Posts: 2 ✭✭✭
    Options

    I have the same issue after recent upgrade of EKS from 1.23 to version 1.25. Any resolution steps or commands that I can try on the backend of DSS to check if everything is okay or not ?

Setup Info
    Tags
      Help me…