Read / write datasets in shell recipe
Hello,
The doc about shell recipes is quite light.
Do someone can help me please, I would like to read data from an input dataset and write data inside an output dataset?
Is it possible to do that if datasets are stored in google cloud storage?
Many thank for your help.
C.
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,218 Dataiker
Hi,
First it would be good understand if you really need resort to shell recipe to achive what you are trying. Python recipe would offer much more flexibility especially when dealing with datasets.
You can read and write to remote dataset( S3, GCP etc) but not directly from remote managed folders with a shell recipe. You should use a Python recipe for that instead.
The existing documentation section relevant is this:
https://doc.dataiku.com/dss/latest/code_recipes/shell.html#piping-a-dataset-in-and-out
In the example below I reading a TSV file from S3 and writting back all of the lines to another S3 dataset.
Let me know if this helps
Answers
-
Hi @AlexT
I like to use shell recipes when I just need to handle simple actions on files.
I this case I only needed to create a simple txt file from a dataset.
Your sample code is perfect and allowed me to understand how to read and write.
Many thanks
C.