Hive/Dremio table to pyspark Dataframe

sigma_loge · February 2023

import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

# Read recipe inputs
internal= dataiku.Dataset("internal22") #internal22 is a hive table
internal_df= dkuspark.get_dataframe(sqlContext, internal)

internal_df.count()# it return as 0 but actual it has million records

Alexandru · February 2023

Hi @sigma_loge
,

Could you try running the same or similar basic spark code in PySpark recipe and share that job diagnostics with support?

https://doc.dataiku.com/dss/latest/code_recipes/pyspark.html#anatomy-of-a-basic-pyspark-recipe

Once you've run, please grab the job diagnostics
https://doc.dataiku.com/dss/latest/troubleshooting/problems/job-fails.html

Raise a ticket and share this with support directly( not on Community)https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html

Thanks

Hive/Dremio table to pyspark Dataframe

Answers

Categories

Setup Info

Tags