Get to know ben_p with this User Highlight Learn More

EMR logs verfication

Level 1
EMR logs verfication

Hi Team,

 

We have configured dss and connected to EMR cluster (edge node) and there I am running jobs by selecting the cluster from project level . So here I would like to know where I can find that cluster is getting used and how can I get confirmation those jobs are running on EMR cluster.is there any particular logs folder for this from EMR side

 

 

Please help me on this.

 

Regards,

Sudheer

0 Kudos
3 Replies
Dataiker
Dataiker

Hi,

You will be able to find out in the cluster's YARN UI (master node on port 8088) if YARN applications are getting submitted there.

If you are running Spark jobs, you'll also find in the logs lines like "Connecting to ResourceManager at THE_ADDRESS_OF_YOUR_CLUSTER", or "Application report for application_XXXXXXXXX"

0 Kudos
Level 1
Author

Hi , I have tried to connect master node with 8088 port , but I am unable to connect to that do I need to add https://masternode:8088 to key values list of Hadoop in dss.

Could you please give more info on this

 

Regards,

Sudheer

0 Kudos
Level 1
Author

Hi , Thank you for the update,

I have configured spark as well, here I am trying to run sample pyspark  code on cluster, but I am getting below error.Could you please help me on this.

 

[2020/03/09-11:07:28.167] [ActivityExecutor-33] [ERROR] [dku.flow.activity] running compute_myspark_NP - Activity failedjava.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration	at com.dataiku.dip.security.impersonation.HadoopDelegationTokensGenerator.generateSparkTokenFile(HadoopDelegationTokensGenerator.java:47)	at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:191)	at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:151)	at com.dataiku.dip.recipes.code.spark.SparkBasedActivityHelper.configure(SparkBasedActivityHelper.java:147)	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:125)	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:108)	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runPySpark(AbstractSparkBasedRecipeRunner.java:93)	at com.dataiku.dip.recipes.code.spark.PySparkRecipeRunner.run(PySparkRecipeRunner.java:63)	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:380)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)... 9 more[2020/03/09-11:07:28.168] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Executing default post-activity lifecycle hook[2020/03/09-11:07:28.170] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Removing samples for DKU_TUTORIAL_BASICS_2.myspark[2020/03/09-11:07:28.172] [ActivityExecutor-33] [INFO] [dku.flow.activity] running compute_myspark_NP - Done po
0 Kudos