Hive/Dremio table to pyspark Dataframe
import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
# Read recipe inputs
internal= dataiku.Dataset("internal22") #internal22 is a hive table
internal_df= dkuspark.get_dataframe(sqlContext, internal)
internal_df.count()# it return as 0 but actual it has million records
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
Hi @sigma_loge
,Could you try running the same or similar basic spark code in PySpark recipe and share that job diagnostics with support?
Once you've run, please grab the job diagnostics
Raise a ticket and share this with support directly( not on Community)