Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Trying to store a text file to a folder on S3 that is not a managed folder in Dataiku

aw30
Level 4
Trying to store a text file to a folder on S3 that is not a managed folder in Dataiku

Hi - I have the following code in a pyspark recipe but it stores the contents of the file in 2 physical files. As you can see in the png the 2 files at the top were manually copied over to the folder. The empinfot1 and empinfo5 were created from the code below. You can see the managed folders worked fine but created cryptic names.  How do I avoid the file from splitting into two? I tried both write.mode and write.csv.

# -*- coding: utf-8 -*-
import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import col, column, concat, lit


sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

# Read recipe inputs
headcount_for_intdelivery = dataiku.Dataset("headcount_for_intdelivery")
headcount_for_intdelivery_df = dkuspark.get_dataframe(sqlContext, headcount_for_intdelivery)

s3_path = 's3://mypath/EMP_INFO3.txt'

#Write dataset
#headcount_for_intdelivery_df.write.mode("overwrite").text(s3_path)
headcount_for_intdelivery_df.write.csv(path=self.s3_path, header="true", mode="overwrite", sep="|")

0 Kudos
0 Replies