Hive/Dremio table to pyspark Dataframe

Level 1
Hive/Dremio table to pyspark Dataframe

import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

# Read recipe inputs
internal= dataiku.Dataset("internal22") #internal22 is a hive table
internal_df= dkuspark.get_dataframe(sqlContext, internal)

internal_df.count()# it return as 0 but actual it has million records


0 Kudos
1 Reply

Hi @sigma_loge ,

Could you try running the same or similar basic spark code in PySpark recipe and share that job diagnostics with support?

Once you've run, please grab the job diagnostics

Raise a ticket and share this with support directly( not on Community)


0 Kudos