How to read the text file using pyspark in Dataiku
I'm new to Dataiku and trying to read the text file using Pyspark in Dataiku. Tried creating dataframe using spark.read.text() & used sparl context to create RDD but both methods throw some error. Now when I'm creating spark context it throws error like "RuntimeError: Java gateway process exited before sending its port number". Also when i use Spark session it says that "device has no space left". Below are codes I'm using.
Spark context:
# Import Dataiku APIs, including the PySpark layer
import dataiku
from dataiku import spark as dkuspark
# Import Spark APIs, both the base SparkContext and higher level SQLContext
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlContext = SQLContext(sc)
dataset1 = dataiku.Dataset("Dataset")
df1 = dkuspark.get_dataframe(sqlContext, dataset1)
Spark session:
#Initialize SparkSession
spark = SparkSession.builder.appName('test').getOrCreate()
Your assistance would really help me a lot. Thanks!
Best Answer
-
Hi @AlexT
AlexT, thanks for replying. Seems it was a temporary issue.
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Could please open a support ticket:
V
with the job diagnostics :
https://doc.dataiku.com/dss/latest/troubleshooting/problems/job-fails.html#getting-a-job-diagnosis
Thanks