Error in Spark process: 'org.apache.spark.sql.DataFrame
fk
Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 4 Partner
Hi,
I am getting error below while running pivot recipe with Spark engine. Can someone help me on this.
[2020/09/18-18:01:13.644] [FRT-42-FlowRunnable] [INFO] [dku.resourceusage] act.compute_my_mapping_table_by_vehicle_tag_NP - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"IMPALA","jobId":"Build_my_mapping_table_by_vehicle_tag_2020-09-18T16-00-34.073","activityId":"compute_my_mapping_table_by_vehicle_tag_NP","activityType":"recipe","recipeType":"pivot","recipeName":"compute_my_mapping_table_by_vehicle_tag"},"id":"OxHCSrYOL1HENQUu","startTime":1600444836021} [2020/09/18-18:01:13.644] [FRT-42-FlowRunnable] [INFO] [dku.usage.computeresource.jek] act.compute_my_mapping_table_by_vehicle_tag_NP - Reporting completion of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"IMPALA","jobId":"Build_my_mapping_table_by_vehicle_tag_2020-09-18T16-00-34.073","activityId":"compute_my_mapping_table_by_vehicle_tag_NP","activityType":"recipe","recipeType":"pivot","recipeName":"compute_my_mapping_table_by_vehicle_tag"},"id":"OxHCSrYOL1HENQUu","startTime":1600444836021,"endTime":1600444873644} [2020/09/18-18:01:13.646] [FRT-42-FlowRunnable] [INFO] [dku.flow.activity] act.compute_my_mapping_table_by_vehicle_tag_NP - Run thread failed for activity compute_my_mapping_table_by_vehicle_tag_NP com.dataiku.common.server.APIError$SerializedErrorException: Error in Spark process: 'org.apache.spark.sql.DataFrame org.apache.spark.sql.SQLContext.createDataFrame(org.apache.spark.rdd.RDD, org.apache.spark.sql.types.StructType)' at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleErrorFile(AbstractCodeBasedActivityRunner.java:221) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:186) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:206) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:115) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:84) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor$PivotRecipeSparkModalityCollectionExecutor.run(PivotRecipeSparkExecutor.java:159) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor.getModalities(PivotRecipeSparkExecutor.java:118) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeExecutor.run(PivotRecipeExecutor.java:98) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeRunner.run(PivotRecipeRunner.java:98) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2020/09/18-18:01:13.781] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_my_mapping_table_by_vehicle_tag_NP - activity is finished [2020/09/18-18:01:13.781] [ActivityExecutor-30] [ERROR] [dku.flow.activity] running compute_my_mapping_table_by_vehicle_tag_NP - Activity failed com.dataiku.common.server.APIError$SerializedErrorException: Error in Spark process: 'org.apache.spark.sql.DataFrame org.apache.spark.sql.SQLContext.createDataFrame(org.apache.spark.rdd.RDD, org.apache.spark.sql.types.StructType)' at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleErrorFile(AbstractCodeBasedActivityRunner.java:221) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:186) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:206) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:115) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:84) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor$PivotRecipeSparkModalityCollectionExecutor.run(PivotRecipeSparkExecutor.java:159) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor.getModalities(PivotRecipeSparkExecutor.java:118) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeExecutor.run(PivotRecipeExecutor.java:98) at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeRunner.run(PivotRecipeRunner.java:98) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2020/09/18-18:01:13.781] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_my_mapping_table_by_vehicle_tag_NP - Executing default post-activity lifecycle hook [2020/09/18-18:01:13.784] [ActivityExecutor-30] [DEBUG] [dku.datasets.hdfs] running compute_my_mapping_table_by_vehicle_tag_NP - HDFS dataset handler dataset=IMPALA.my_mapping_table_by_vehicle_tag connection=hdfs_managed cpr=/user/dss2-admin/dss_managed_datasets resolvedPath=/IMPALA/my_mapping_table_by_vehicle_tag connRootSA=nullconnRootWithinSA=/user/dss2-admin/dss_managed_datasets configuredRootPathWithinSA=/user/dss2-admin/dss_managed_datasets/IMPALA/my_mapping_table_by_vehicle_tag effectiveRootPathWithinSA=/user/dss2-admin/dss_managed_datasets/IMPALA/my_mapping_table_by_vehicle_tag [2020/09/18-18:01:13.785] [ActivityExecutor-30] [DEBUG] [dku.fsproviders.hdfs] running compute_my_mapping_table_by_vehicle_tag_NP - Build HDFSProvider conn=hdfs_managed cpr=/user/dss2-admin/dss_managed_datasets
Answers
-
Hi,
This error indicates that your setup is not functioning. Could you please provide details about how you installed DSS and Spark ?
If you are working with a Dataiku customer or evaluation, please reach out to Dataiku Support (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-dataiku-customers)