store json file or any file from DSS to S3 bucket or NAS (UNC path)

ananth Registered Posts: 41 ✭✭✭✭✭


how to store json file or any file from DSS to S3 bucket or NAS (UNC path) from DataIku TShirt?




  • NedM
    NedM Dataiker Posts: 8 Dataiker
    edited July 17

    Hi Ananth,

    Where are the files you want to save currently located? Are they accessible from within DSS?

    In general, the easiest way to store files to a filesystem-like location (such as S3) from DSS is by writing to a managed folder from a code recipe.

    In the following example, I'll demonstrate how to save a DSS dataset as a JSON file to an S3 bucket.

    First, create a connection with the target S3 bucket (see our documentation for details). Second, from the Flow, select the dataset you want to save, and create a python recipe with a managed folder as output; take care to ensure that the managed folder is stored on your newly created S3 connection.


    Third, add the following code to the body of the python recipe. In short, this code reads the input dataset as a pandas dataframe, transforms the dataframe into a JSON string, and then saves it as a JSON file to the managed folder (stored on the S3 bucket).

    import dataiku
    from io import StringIO
    # Read recipe inputs
    avocado_transactions = dataiku.Dataset("avocado_transactions")
    df = avocado_transactions.get_dataframe()
    # Convert the pandas dataframe to a json string
    df_json = df.to_json()
    # Write the json to the managed folder on S3
    folder_on_s3 = dataiku.Folder("4sRgBwqi")
    folder_on_s3.upload_stream("/df.json", StringIO(df_json))

    Note: before using this code, you'll need to edit the name of the input dataset, and the folder ID.

    Hopefully this helps, although please let me know if I've misunderstood your question!


  • ananth
    ananth Registered Posts: 41 ✭✭✭✭✭

    Hi @NedM

    thank you for sharing the details, i have webapp created and from Python we have

    with open('/app/dataiku/DSS_DATA_DIR/test.json', "a") as test:
    test.write("[" + data + "]")

    can this be updated to write directly on S3?

Setup Info
      Help me…