Error in Spark process: 'org.apache.spark.sql.DataFrame

fk
fk Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 4 Partner

Hi,

I am getting error below while running pivot recipe with Spark engine. Can someone help me on this.

[2020/09/18-18:01:13.644] [FRT-42-FlowRunnable] [INFO] [dku.resourceusage] act.compute_my_mapping_table_by_vehicle_tag_NP - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"IMPALA","jobId":"Build_my_mapping_table_by_vehicle_tag_2020-09-18T16-00-34.073","activityId":"compute_my_mapping_table_by_vehicle_tag_NP","activityType":"recipe","recipeType":"pivot","recipeName":"compute_my_mapping_table_by_vehicle_tag"},"id":"OxHCSrYOL1HENQUu","startTime":1600444836021}
[2020/09/18-18:01:13.644] [FRT-42-FlowRunnable] [INFO] [dku.usage.computeresource.jek] act.compute_my_mapping_table_by_vehicle_tag_NP - Reporting completion of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"IMPALA","jobId":"Build_my_mapping_table_by_vehicle_tag_2020-09-18T16-00-34.073","activityId":"compute_my_mapping_table_by_vehicle_tag_NP","activityType":"recipe","recipeType":"pivot","recipeName":"compute_my_mapping_table_by_vehicle_tag"},"id":"OxHCSrYOL1HENQUu","startTime":1600444836021,"endTime":1600444873644}
[2020/09/18-18:01:13.646] [FRT-42-FlowRunnable] [INFO] [dku.flow.activity] act.compute_my_mapping_table_by_vehicle_tag_NP - Run thread failed for activity compute_my_mapping_table_by_vehicle_tag_NP
com.dataiku.common.server.APIError$SerializedErrorException: Error in Spark process: 'org.apache.spark.sql.DataFrame org.apache.spark.sql.SQLContext.createDataFrame(org.apache.spark.rdd.RDD, org.apache.spark.sql.types.StructType)'
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleErrorFile(AbstractCodeBasedActivityRunner.java:221)
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:186)
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:206)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:115)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:84)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor$PivotRecipeSparkModalityCollectionExecutor.run(PivotRecipeSparkExecutor.java:159)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor.getModalities(PivotRecipeSparkExecutor.java:118)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeExecutor.run(PivotRecipeExecutor.java:98)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeRunner.run(PivotRecipeRunner.java:98)
    at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2020/09/18-18:01:13.781] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_my_mapping_table_by_vehicle_tag_NP - activity is finished
[2020/09/18-18:01:13.781] [ActivityExecutor-30] [ERROR] [dku.flow.activity] running compute_my_mapping_table_by_vehicle_tag_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Error in Spark process: 'org.apache.spark.sql.DataFrame org.apache.spark.sql.SQLContext.createDataFrame(org.apache.spark.rdd.RDD, org.apache.spark.sql.types.StructType)'
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleErrorFile(AbstractCodeBasedActivityRunner.java:221)
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:186)
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:206)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:115)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:84)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor$PivotRecipeSparkModalityCollectionExecutor.run(PivotRecipeSparkExecutor.java:159)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeSparkExecutor.getModalities(PivotRecipeSparkExecutor.java:118)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeExecutor.run(PivotRecipeExecutor.java:98)
    at com.dataiku.dip.dataflow.exec.pivot.PivotRecipeRunner.run(PivotRecipeRunner.java:98)
    at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2020/09/18-18:01:13.781] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_my_mapping_table_by_vehicle_tag_NP - Executing default post-activity lifecycle hook
[2020/09/18-18:01:13.784] [ActivityExecutor-30] [DEBUG] [dku.datasets.hdfs] running compute_my_mapping_table_by_vehicle_tag_NP - HDFS dataset handler dataset=IMPALA.my_mapping_table_by_vehicle_tag connection=hdfs_managed cpr=/user/dss2-admin/dss_managed_datasets resolvedPath=/IMPALA/my_mapping_table_by_vehicle_tag connRootSA=nullconnRootWithinSA=/user/dss2-admin/dss_managed_datasets configuredRootPathWithinSA=/user/dss2-admin/dss_managed_datasets/IMPALA/my_mapping_table_by_vehicle_tag effectiveRootPathWithinSA=/user/dss2-admin/dss_managed_datasets/IMPALA/my_mapping_table_by_vehicle_tag
[2020/09/18-18:01:13.785] [ActivityExecutor-30] [DEBUG] [dku.fsproviders.hdfs] running compute_my_mapping_table_by_vehicle_tag_NP - Build HDFSProvider conn=hdfs_managed cpr=/user/dss2-admin/dss_managed_datasets 

Answers

Setup Info
    Tags
      Help me…