Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am trying to configure the usage of kubernetes in DSS, specifically to attach a cluster of Azure Kubernetes to my DSS instance. I have done all the steps mentioned in the documentation(Initial setup โ Dataiku DSS 12 documentation), I already pushed the base image, opened the ports 1024 to 65535 to the kubernetes cluster IP(Unable to connect to DSS from container - Dataiku Community), and finally I tested the connection and it is successful, but, when I try to run a spark job in Dataiku, it throws a connection timeout error. Is there any solution to this?
This is the log from the main activity:
[20:51:05] [INFO] [dku.flow.activity] - Run thread failed for activity compute_orders_prepared_NP
com.dataiku.common.server.APIError$SerializedErrorException: Connect to ******:**** [/******] failed: Connection timed out (Connection timed out), caused by: ConnectException: Connection timed out (Connection timed out)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner$3.throwFromErrorFileOrLogs(AbstractSparkBasedRecipeRunner.java:333)
at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:348)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:147)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:116)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:101)
at com.dataiku.dip.recipes.shaker.ShakerSparkRecipeRunner.run(ShakerSparkRecipeRunner.java:50)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:378)
[20:51:05] [INFO] [dku.flow.activity] running compute_orders_prepared_NP - activity is finished
[20:51:05] [ERROR] [dku.flow.activity] running compute_orders_prepared_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Connect to *****:***** [/*******] failed: Connection timed out (Connection timed out), caused by: ConnectException: Connection timed out (Connection timed out)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner$3.throwFromErrorFileOrLogs(AbstractSparkBasedRecipeRunner.java:333)
at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:348)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:147)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:116)
at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:101)
at com.dataiku.dip.recipes.shaker.ShakerSparkRecipeRunner.run(ShakerSparkRecipeRunner.java:50)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:378)
[20:51:05] [INFO] [dku.flow.activity] running compute_orders_prepared_NP - Executing default post-activity lifecycle hook
[20:51:05] [INFO] [dku.flow.activity] running compute_orders_prepared_NP - Done post-activity tasks
Also it is worth to mention that I am using the default spark configuration mentioned in the above documentation.
Operating system used: Ubuntu 20.04.1 LTS