EMR logs verfication

Sudheer
Sudheer Dataiku DSS Core Designer, Registered Posts: 5 ✭✭✭✭

Hi Team,

We have configured dss and connected to EMR cluster (edge node) and there I am running jobs by selecting the cluster from project level . So here I would like to know where I can find that cluster is getting used and how can I get confirmation those jobs are running on EMR cluster.is there any particular logs folder for this from EMR side

Please help me on this.

Regards,

Sudheer

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker

    Hi,

    You will be able to find out in the cluster's YARN UI (master node on port 8088) if YARN applications are getting submitted there.

    If you are running Spark jobs, you'll also find in the logs lines like "Connecting to ResourceManager at THE_ADDRESS_OF_YOUR_CLUSTER", or "Application report for application_XXXXXXXXX"

  • Sudheer
    Sudheer Dataiku DSS Core Designer, Registered Posts: 5 ✭✭✭✭
    edited July 17

    Hi , Thank you for the update,

    I have configured spark as well, here I am trying to run sample pyspark code on cluster, but I am getting below error.Could you please help me on this.

    [2020/03/09-11:07:28.167] [ActivityExecutor-33] [ERROR] [dku.flow.activity] running compute_myspark_NP - Activity failedjava.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration  at com.dataiku.dip.security.impersonation.HadoopDelegationTokensGenerator.generateSparkTokenFile(HadoopDelegationTokensGenerator.java:47)  at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:191)    at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:151)    at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:147)    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:125)    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:108)    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:93) at com.dataiku.dip.recipes.code.spark.PySparkRecipeRunner.run(PySparkRecipeRunner.java:63) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:380)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)... 9 more[2020/03/09-11:07:28.168] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Executing default post-activity lifecycle hook[2020/03/09-11:07:28.170] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Removing samples for DKU_TUTORIAL_BASICS_2.myspark[2020/03/09-11:07:28.172] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Done po
  • Sudheer
    Sudheer Dataiku DSS Core Designer, Registered Posts: 5 ✭✭✭✭

    Hi , I have tried to connect master node with 8088 port , but I am unable to connect to that do I need to add https://masternode:8088 to key values list of Hadoop in dss.

    Could you please give more info on this

    Regards,

    Sudheer

Setup Info
    Tags
      Help me…