Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Spark on yarn to local dataset ?

Level 1
Spark on yarn to local dataset ?

I'm trying to run a job that reads a file on hdfs do something with it and saves it on my driver host, i'm getting: 

Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 19, s40695.dc4.local): hadoop-terminal

[2015/11/10-16:20:38.171] [Exec-30] [INFO] [dku.utils] - at Method)
[2015/11/10-16:20:38.172] [Exec-30] [INFO] [dku.utils] - at$1.lookupAllHostAddr(
[2015/11/10-16:20:38.172] [Exec-30] [INFO] [dku.utils] - at
[2015/11/10-16:20:38.173] [Exec-30] [INFO] [dku.utils] - at
[2015/11/10-16:20:38.173] [Exec-30] [INFO] [dku.utils] - at
[2015/11/10-16:20:38.174] [Exec-30] [INFO] [dku.utils] - at
[2015/11/10-16:20:38.175] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(
[2015/11/10-16:20:38.175] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(
[2015/11/10-16:20:38.175] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(
[2015/11/10-16:20:38.176] [Exec-30] [INFO] [dku.utils] - at
[2015/11/10-16:20:38.176] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(
[2015/11/10-16:20:38.177] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.DefaultRequestDirector.execute(
[2015/11/10-16:20:38.177] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.AbstractHttpClient.doExecute(
[2015/11/10-16:20:38.177] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.CloseableHttpClient.execute(
[2015/11/10-16:20:38.178] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.CloseableHttpClient.execute(
[2015/11/10-16:20:38.178] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.CloseableHttpClient.execute(
[2015/11/10-16:20:38.179] [Exec-30] [INFO] [dku.utils] - at com.dataiku.common.apiclient.InternalAPIClient.postFormToJSON(
[2015/11/10-16:20:38.179] [Exec-30] [INFO] [dku.utils] - at com.dataiku.dip.spark.RemoteDatasetRDD$$anonfun$saveToDataset$1.apply(RemoteDatasetRDD.scala:190)
[2015/11/10-16:20:38.180] [Exec-30] [INFO] [dku.utils] - at com.dataiku.dip.spark.RemoteDatasetRDD$$anonfun$saveToDataset$1.apply(RemoteDatasetRDD.scala:179)


i've tried to set explicity the ip with but didnt help 😞

0 Kudos
1 Reply

Generally speaking, most of the Hadoop ecosystem won't work if all hosts can't resolve themselves. You would need to add an entry mapping to hadoop-terminal in your /etc/hosts file
0 Kudos
Labels (1)
A banner prompting to get Dataiku DSS