Spark can’t read my HDFS datasets

Benoni · May 2019

Hello,

Spark won't see hfds:/// and just looks for file:/// when i'm trying to process a HDFS managed dataset. I followed the How-To link on:

https://www.dataiku.com/learn/guide/spark/tips-and-troubleshooting.html

However couldn't figure out what to edit. Here is my env-spark.sh in DATA_DIR/bin/

```

export DKU_SPARK_ENABLED=true

export DKU_SPARK_HOME='/usr/local/spark'

export DKU_SPARK_VERSION='2.4.2'

export PYSPARK_DRIVER_PYTHON="$DKUPYTHONBIN"

export DKU_PYSPARK_PYTHONPATH='/usr/local/spark/python:/usr/local/spark/python/lib/py4j-0.10.7-src.zip'

if [ -n "$DKURBIN" ]; then

export SPARKR_DRIVER_R="$DKURBIN"

fi

```

My hadoop is located at /usr/local/hadoop and spark is located at /usr/local/spark.

Can you please help me? Thanks in advance.

Benoni · May 2019

Solved it by:

```
export HADOOP_INSTALL='usr/local/hadoop'
export HADOOP_CONF_DIR='/usr/local/hadoop/etc/hadoop'
export DKU_SPARK_ENABLED=true
export DKU_SPARK_HOME='/usr/local/spark'
export DKU_SPARK_VERSION='2.4.2'
export PYSPARK_DRIVER_PYTHON="$DKUPYTHONBIN"
export DKU_PYSPARK_PYTHONPATH='/usr/local/spark/python:/usr/local/spark/python/lib/py4j-0.10.7-src.zip'
if [ -n "$DKURBIN" ]; then
export SPARKR_DRIVER_R="$DKURBIN"
fi

```
Thanks anyways my dudes

Spark can’t read my HDFS datasets

Best Answer

Categories

Setup Info

Tags