Spark on yarn to local dataset ?

q666
q666 Registered Posts: 11 ✭✭✭✭

I'm trying to run a job that reads a file on hdfs do something with it and saves it on my driver host, i'm getting:


Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 19, s40695.dc4.local): java.net.UnknownHostException: hadoop-terminal

[2015/11/10-16:20:38.171] [Exec-30] [INFO] [dku.utils] - at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
[2015/11/10-16:20:38.172] [Exec-30] [INFO] [dku.utils] - at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
[2015/11/10-16:20:38.172] [Exec-30] [INFO] [dku.utils] - at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286)
[2015/11/10-16:20:38.173] [Exec-30] [INFO] [dku.utils] - at java.net.InetAddress.getAllByName0(InetAddress.java:1239)
[2015/11/10-16:20:38.173] [Exec-30] [INFO] [dku.utils] - at java.net.InetAddress.getAllByName(InetAddress.java:1155)
[2015/11/10-16:20:38.174] [Exec-30] [INFO] [dku.utils] - at java.net.InetAddress.getAllByName(InetAddress.java:1091)
[2015/11/10-16:20:38.175] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:44)
[2015/11/10-16:20:38.175] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:259)
[2015/11/10-16:20:38.175] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:159)
[2015/11/10-16:20:38.176] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
[2015/11/10-16:20:38.176] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:610)
[2015/11/10-16:20:38.177] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:445)
[2015/11/10-16:20:38.177] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
[2015/11/10-16:20:38.177] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
[2015/11/10-16:20:38.178] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
[2015/11/10-16:20:38.178] [Exec-30] [INFO] [dku.utils] - at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
[2015/11/10-16:20:38.179] [Exec-30] [INFO] [dku.utils] - at com.dataiku.common.apiclient.InternalAPIClient.postFormToJSON(InternalAPIClient.java:153)
[2015/11/10-16:20:38.179] [Exec-30] [INFO] [dku.utils] - at com.dataiku.dip.spark.RemoteDatasetRDD$$anonfun$saveToDataset$1.apply(RemoteDatasetRDD.scala:190)
[2015/11/10-16:20:38.180] [Exec-30] [INFO] [dku.utils] - at com.dataiku.dip.spark.RemoteDatasetRDD$$anonfun$saveToDataset$1.apply(RemoteDatasetRDD.scala:179)

i've tried to set explicity the ip with spark.driver.host but didnt help :(

Tagged:

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    Generally speaking, most of the Hadoop ecosystem won't work if all hosts can't resolve themselves. You would need to add an entry mapping 127.0.0.1 to hadoop-terminal in your /etc/hosts file
Setup Info
    Tags
      Help me…