Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I have a problem with a Hive table :
- when trying to process the table using Pyspark (ex : df.count() ) i get 0 rows which means an empty DataFrame.
- then when trying to investigate, using a Hive query (SELECT COUNT(*) FROM TABLE) i get all the data in that table.
Does anyone have a solution to that or knows why it behave like that ?
Thank you
Hi @Houssam_2000 ,
How are you creating the df? Please try to reproduce the issue in test PySpark recipe and share the job diagnostics over a support ticket.
I couldn't reproduce the issue mentioned:
import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
# Read recipe inputs
dataset = dataiku.Dataset("dataset_name")
df = dkuspark.get_dataframe(sqlContext, dataset)
print(df.count())
Return the expected number of rows in PySpark Notebook.