Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Writing Pyspark Dataframes into a managed folder

Level 1
Writing Pyspark Dataframes into a managed folder

How does one dump parquet files into a manage folder when the type is of DataFrame (pyspark df)?

0 Kudos
1 Reply

Hi @Sreyasha,

You can achieve this by converting the Spark DataFrame to local Pandas DataFrame  using toPandas method and then simply use to_csv. I've provided sample code below, which you can execute in a notebook.

import dataiku
import dataiku.spark as dkuspark
import pyspark
from pyspark.sql import SQLContext

# Load PySpark
sc = pyspark.SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

# Example: Read the descriptor of a Dataiku dataset
mydataset = dataiku.Dataset("Dataset1")

# And read it as a Spark dataframe
df = dkuspark.get_dataframe(sqlContext, mydataset)

folder = dataiku.Folder("folder-id")

filename= "testfile.csv"
with folder.get_writer(filename) as w:


If you have any questions, please let us know.




0 Kudos