Hive/Dremio table to pyspark Dataframe
import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
# Read recipe inputs
internal= dataiku.Dataset("internal22") #internal22 is a hive table
internal_df= dkuspark.get_dataframe(sqlContext, internal)
internal_df.count()# it return as 0 but actual it has million records
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @sigma_loge
,Could you try running the same or similar basic spark code in PySpark recipe and share that job diagnostics with support?
https://doc.dataiku.com/dss/latest/code_recipes/pyspark.html#anatomy-of-a-basic-pyspark-recipe
Once you've run, please grab the job diagnostics
https://doc.dataiku.com/dss/latest/troubleshooting/problems/job-fails.html
Raise a ticket and share this with support directly( not on Community)https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html
Thanks