Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on November 28, 2022 1:55PM
Likes: 0
Replies: 2
How should I:
1. Read CSVs from an S3 folder
2. Process these CSVs with custom python code
3. Write these processed CSVs to an S3 folder. A different folder to the input I guess.
Thanks in advance
Hi @meevans1
. I think it is important to add one extra information to the process described by @MiguelangelC
, and it has to do with the api calls you will need to use to read and write data in a S3 bucket connected as a Folder in dataiku: you'll need to use "get_download_stream" and "upload_stream" for reading and writing operations, or install the boto3 library.
Hope this helps, after following the instructions from @MiguelangelC
Cheers.
Hi,
1)
In order to connect to a S3 bucket you first need to have such a connection defined in your DSS instance.
You can create new connections in your node by going to Administration > Connections > New Connection > Amazon S3. You can follow the documentation on the necessary requisites: https://doc.dataiku.com/dss/latest/connecting/s3.html
Once you have set up the connection, from the Flow a dataset or folder can be created pointing to the S3 connection and the particular file/path in the bucket.
2)
This can be done by either using a code recipe or a notebook depending on your requirements
3)
Provided you are using the same details from the already existing S3 connection, you can reuse it to write the data wherever you want in the bucket.
Since these questions deal with the basic functionalities of DSS, I think you'd benefit greatly from going through the basic DSS learning path: https://academy.dataiku.com/path/core-designer