Read CSVs from S3 folder, process, write processed CSVs to S3 folder

meevans1
meevans1 Registered Posts: 5

How should I:

1. Read CSVs from an S3 folder

2. Process these CSVs with custom python code

3. Write these processed CSVs to an S3 folder. A different folder to the input I guess.

Thanks in advance

Tagged:

Best Answer

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron
    Answer ✓

    Hi @meevans1
    . I think it is important to add one extra information to the process described by @MiguelangelC
    , and it has to do with the api calls you will need to use to read and write data in a S3 bucket connected as a Folder in dataiku: you'll need to use "get_download_stream" and "upload_stream" for reading and writing operations, or install the boto3 library.

    Hope this helps, after following the instructions from @MiguelangelC

    Cheers.

Answers

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker

    Hi,

    1)
    In order to connect to a S3 bucket you first need to have such a connection defined in your DSS instance.
    You can create new connections in your node by going to Administration > Connections > New Connection > Amazon S3. You can follow the documentation on the necessary requisites: https://doc.dataiku.com/dss/latest/connecting/s3.html

    Once you have set up the connection, from the Flow a dataset or folder can be created pointing to the S3 connection and the particular file/path in the bucket.

    2)
    This can be done by either using a code recipe or a notebook depending on your requirements

    3)
    Provided you are using the same details from the already existing S3 connection, you can reuse it to write the data wherever you want in the bucket.

    Since these questions deal with the basic functionalities of DSS, I think you'd benefit greatly from going through the basic DSS learning path: https://academy.dataiku.com/path/core-designer

Setup Info
    Tags
      Help me…