Writing Data to s3 from Dataiku

Ankur30
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner

Good Morning

I am working on writing/appending data to s3 bucket from dataiku. But everytime i run my synch recipe a new csv file is created , i want data in a single csv file everytime i run my synch recipe.

Kindly help me with solution. Please find attached screenshot.

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker
    edited July 17 Answer ✓

    Hi,

    The error suggests you are using code that writes to the local filesystem.

    For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.

    For example use upload_stream() or upload_file() SeeL https://doc.dataiku.com/dss/latest/python-api/managed_folders.html for more details.

    Here is an generic example :

    ```
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    managed_folder_id = "URKU7Oqb"
    
    # Read dataset convert df to csv inst
    my_dataset = dataiku.Dataset("customers_labeled_prepared")
    df = my_dataset.get_dataframe()
    
    df.to_csv(index=False).encode("utf-8")
    
    # Write recipe outputs
    output_folder = dataiku.Folder(managed_folder_id)
    output_folder.upload_stream("some_name.csv", df.to_csv(index=False).encode("utf-8"))
    ```

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker

    Hi Ankur,

    By default, any new data will be written to new files when syncing to an S3 Dataset.

    To change this behavior you can edit the settings of the output dataset under Advanced - Force single output file and you can also set the file base name :

    Please refer to screenshot below:

    Screenshot 2021-10-26 at 12.07.13.png

    Let me know if that works for you.

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner

    Hi @AlexT
    ,

    Thanks for this but I want to write all the input DSS datasets in the csv format to my s3 bucket using python recipe. But while writing I am getting error. Attached is the screenshot of error message.

    Regards,

    Ankur.

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner

    Hi @AlexT
    ,

    Thank you for all the help and support you have provided to me till now. Looking forward for your continued support. i really appreciate it.

    Thank You,

    Ankur.

Setup Info
    Tags
      Help me…