Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Where are the files you want to save currently located? Are they accessible from within DSS?
In general, the easiest way to store files to a filesystem-like location (such as S3) from DSS is by writing to a managed folder from a code recipe.
In the following example, I'll demonstrate how to save a DSS dataset as a JSON file to an S3 bucket.
First, create a connection with the target S3 bucket (see our documentation for details). Second, from the Flow, select the dataset you want to save, and create a python recipe with a managed folder as output; take care to ensure that the managed folder is stored on your newly created S3 connection.
Third, add the following code to the body of the python recipe. In short, this code reads the input dataset as a pandas dataframe, transforms the dataframe into a JSON string, and then saves it as a JSON file to the managed folder (stored on the S3 bucket).
import dataiku from io import StringIO # Read recipe inputs avocado_transactions = dataiku.Dataset("avocado_transactions") df = avocado_transactions.get_dataframe() # Convert the pandas dataframe to a json string df_json = df.to_json() # Write the json to the managed folder on S3 folder_on_s3 = dataiku.Folder("4sRgBwqi") folder_on_s3.upload_stream("/df.json", StringIO(df_json))
Note: before using this code, you'll need to edit the name of the input dataset, and the folder ID.
Hopefully this helps, although please let me know if I've misunderstood your question!
Hi @NedM ,
thank you for sharing the details, i have webapp created and from Python we have
with open('/app/dataiku/DSS_DATA_DIR/test.json', "a") as test:
test.write("[" + data + "]")
can this be updated to write directly on S3?