The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

store json file or any file from DSS to S3 bucket or NAS (UNC path)

Level 3
store json file or any file from DSS to S3 bucket or NAS (UNC path)



how to store json file or any file from DSS to S3 bucket or NAS (UNC path) from DataIku TShirt?




0 Kudos
2 Replies

Hi Ananth,

Where are the files you want to save currently located? Are they accessible from within DSS?

In general, the easiest way to store files to a filesystem-like location (such as S3) from DSS is by writing to a managed folder from a code recipe.

In the following example, I'll demonstrate how to save a DSS dataset as a JSON file to an S3 bucket.

First, create a connection with the target S3 bucket (see our documentation for details). Second, from the Flow, select the dataset you want to save, and create a python recipe with a managed folder as output; take care to ensure that the managed folder is stored on your newly created S3 connection.


Third, add the following code to the body of the python recipe. In short, this code reads the input dataset as a pandas dataframe, transforms the dataframe into a JSON string, and then saves it as a JSON file to the managed folder (stored on the S3 bucket).

import dataiku
from io import StringIO

# Read recipe inputs
avocado_transactions = dataiku.Dataset("avocado_transactions")
df = avocado_transactions.get_dataframe()

# Convert the pandas dataframe to a json string
df_json = df.to_json()

# Write the json to the managed folder on S3
folder_on_s3 = dataiku.Folder("4sRgBwqi")
folder_on_s3.upload_stream("/df.json", StringIO(df_json))

Note: before using this code, you'll need to edit the name of the input dataset, and the folder ID.

Hopefully this helps, although please let me know if I've misunderstood your question!


0 Kudos
Level 3

Hi @NedM ,


thank you for sharing the details, i have webapp created and from Python we have 


with open('/app/dataiku/DSS_DATA_DIR/test.json', "a") as test:
test.write("[" + data + "]")


can this be updated to write directly on S3?

0 Kudos
A banner prompting to get Dataiku DSS