EMR logs verfication

Sudheer · March 2020

Hi Team,

We have configured dss and connected to EMR cluster (edge node) and there I am running jobs by selecting the cluster from project level . So here I would like to know where I can find that cluster is getting used and how can I get confirmation those jobs are running on EMR cluster.is there any particular logs folder for this from EMR side

Please help me on this.

Regards,

Sudheer

Clément_Stenac · March 2020

Hi,

You will be able to find out in the cluster's YARN UI (master node on port 8088) if YARN applications are getting submitted there.

If you are running Spark jobs, you'll also find in the logs lines like "Connecting to ResourceManager at THE_ADDRESS_OF_YOUR_CLUSTER", or "Application report for application_XXXXXXXXX"

Sudheer · March 2020

Hi , Thank you for the update,

I have configured spark as well, here I am trying to run sample pyspark code on cluster, but I am getting below error.Could you please help me on this.

[2020/03/09-11:07:28.167] [ActivityExecutor-33] [ERROR] [dku.flow.activity] running compute_myspark_NP - Activity failedjava.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration  at com.dataiku.dip.security.impersonation.HadoopDelegationTokensGenerator.generateSparkTokenFile(HadoopDelegationTokensGenerator.java:47)  at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:191)    at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:151)    at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:147)    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:125)    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:108)    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:93) at com.dataiku.dip.recipes.code.spark.PySparkRecipeRunner.run(PySparkRecipeRunner.java:63) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:380)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)... 9 more[2020/03/09-11:07:28.168] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Executing default post-activity lifecycle hook[2020/03/09-11:07:28.170] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Removing samples for DKU_TUTORIAL_BASICS_2.myspark[2020/03/09-11:07:28.172] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Done po

Sudheer · March 2020

Hi , I have tried to connect master node with 8088 port , but I am unable to connect to that do I need to add https://masternode:8088 to key values list of Hadoop in dss.

Could you please give more info on this

Regards,

Sudheer

EMR logs verfication

Answers

Categories

Setup Info

Tags