EMR logs verfication
Hi Team,
We have configured dss and connected to EMR cluster (edge node) and there I am running jobs by selecting the cluster from project level . So here I would like to know where I can find that cluster is getting used and how can I get confirmation those jobs are running on EMR cluster.is there any particular logs folder for this from EMR side
Please help me on this.
Regards,
Sudheer
Answers
-
Hi,
You will be able to find out in the cluster's YARN UI (master node on port 8088) if YARN applications are getting submitted there.
If you are running Spark jobs, you'll also find in the logs lines like "Connecting to ResourceManager at THE_ADDRESS_OF_YOUR_CLUSTER", or "Application report for application_XXXXXXXXX"
-
Hi , Thank you for the update,
I have configured spark as well, here I am trying to run sample pyspark code on cluster, but I am getting below error.Could you please help me on this.
[2020/03/09-11:07:28.167] [ActivityExecutor-33] [ERROR] [dku.flow.activity] running compute_myspark_NP - Activity failedjava.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at com.dataiku.dip.security.impersonation.HadoopDelegationTokensGenerator.generateSparkTokenFile(HadoopDelegationTokensGenerator.java:47) at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:191) at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:151) at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:147) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:125) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:108) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:93) at com.dataiku.dip.recipes.code.spark.PySparkRecipeRunner.run(PySparkRecipeRunner.java:63) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:380)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)... 9 more[2020/03/09-11:07:28.168] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Executing default post-activity lifecycle hook[2020/03/09-11:07:28.170] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Removing samples for DKU_TUTORIAL_BASICS_2.myspark[2020/03/09-11:07:28.172] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Done po
-
Hi , I have tried to connect master node with 8088 port , but I am unable to connect to that do I need to add https://masternode:8088 to key values list of Hadoop in dss.
Could you please give more info on this
Regards,
Sudheer