Trying to store a text file to a folder on S3 that is not a managed folder in Dataiku

Options
aw30
aw30 Dataiku DSS & SQL, Registered Posts: 49 ✭✭✭✭✭

Hi - I have the following code in a pyspark recipe but it stores the contents of the file in 2 physical files. As you can see in the png the 2 files at the top were manually copied over to the folder. The empinfot1 and empinfo5 were created from the code below. You can see the managed folders worked fine but created cryptic names. How do I avoid the file from splitting into two? I tried both write.mode and write.csv.

# -*- coding: utf-8 -*-
import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import col, column, concat, lit


sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

# Read recipe inputs
headcount_for_intdelivery = dataiku.Dataset("headcount_for_intdelivery")
headcount_for_intdelivery_df = dkuspark.get_dataframe(sqlContext, headcount_for_intdelivery)

s3_path = 's3://mypath/EMP_INFO3.txt'

#Write dataset
#headcount_for_intdelivery_df.write.mode("overwrite").text(s3_path)
headcount_for_intdelivery_df.write.csv(path=self.s3_path, header="true", mode="overwrite", sep="|")

Setup Info
    Tags
      Help me…