Read CSVs from S3 folder, process, write processed CSVs to S3 folder
How should I:
1. Read CSVs from an S3 folder
2. Process these CSVs with custom python code
3. Write these processed CSVs to an S3 folder. A different folder to the input I guess.
Thanks in advance
Best Answer
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @meevans1
. I think it is important to add one extra information to the process described by @MiguelangelC
, and it has to do with the api calls you will need to use to read and write data in a S3 bucket connected as a Folder in dataiku: you'll need to use "get_download_stream" and "upload_stream" for reading and writing operations, or install the boto3 library.Hope this helps, after following the instructions from @MiguelangelC
Cheers.
Answers
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
Hi,
1)
In order to connect to a S3 bucket you first need to have such a connection defined in your DSS instance.
You can create new connections in your node by going to Administration > Connections > New Connection > Amazon S3. You can follow the documentation on the necessary requisites: https://doc.dataiku.com/dss/latest/connecting/s3.htmlOnce you have set up the connection, from the Flow a dataset or folder can be created pointing to the S3 connection and the particular file/path in the bucket.
2)
This can be done by either using a code recipe or a notebook depending on your requirements3)
Provided you are using the same details from the already existing S3 connection, you can reuse it to write the data wherever you want in the bucket.Since these questions deal with the basic functionalities of DSS, I think you'd benefit greatly from going through the basic DSS learning path: https://academy.dataiku.com/path/core-designer