Read / write datasets in shell recipe

Options
Chiktika
Chiktika Registered Posts: 24 ✭✭✭✭

Hello,

The doc about shell recipes is quite light.
Do someone can help me please, I would like to read data from an input dataset and write data inside an output dataset?

Is it possible to do that if datasets are stored in google cloud storage?

Many thank for your help.

C.

Tagged:

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Answer ✓
    Options

    Hi,

    First it would be good understand if you really need resort to shell recipe to achive what you are trying. Python recipe would offer much more flexibility especially when dealing with datasets.

    You can read and write to remote dataset( S3, GCP etc) but not directly from remote managed folders with a shell recipe. You should use a Python recipe for that instead.

    The existing documentation section relevant is this:

    https://doc.dataiku.com/dss/latest/code_recipes/shell.html#piping-a-dataset-in-and-out

    In the example below I reading a TSV file from S3 and writting back all of the lines to another S3 dataset.

    Screenshot 2021-06-18 at 18.06.13.png

    Let me know if this helps

Answers

  • Chiktika
    Chiktika Registered Posts: 24 ✭✭✭✭
    Options

    Hi @AlexT

    I like to use shell recipes when I just need to handle simple actions on files.

    I this case I only needed to create a simple txt file from a dataset.

    Your sample code is perfect and allowed me to understand how to read and write.

    Many thanks ‌‌

    C.

Setup Info
    Tags
      Help me…