Update partitions in python code

Options
sunith992
sunith992 Dataiku DSS Core Designer, Registered Posts: 20 ✭✭✭✭

Hi

i would like to ingest all partitions into python dataframes, updating these partitions by applying some calculations and then write these updated partitions back to the output DSS dataset.

can anyone please help with the coding/functions to ingest and update back to the output dataset

Tagged:

Best Answer

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
    Answer ✓
    Options

    Hi,

    It depends if you are using a notebook or a recipe.

    On a notebook you must specify the partition you aim to write to by using the 'set_write_partition' method. In order to do that, you also need to create a handle fo write to that dataset. You do that with the 'get_writer' method.

    The easiest way to get around this is to do it on a loop across the different partitions.For example, see the code in Capture.PNG.

    On a code recipe we do not need to specify the partition to write. it is taken care of by the logic of the flow. That is the definition of the partitions to build and the repartitioning mode, e.g. Equals, Time range, Explicit values, etc.

Answers

  • sunith992
    sunith992 Dataiku DSS Core Designer, Registered Posts: 20 ✭✭✭✭
    Options

    Hi Migue,

    Thanks a lot for the above response, it helped me.but there is an error due as there is a schema inconsistency, wondering if there are any steps like 'infer_schema' or 'Dropandrecreate' to be used to overcome in the same logic provided. please help. Thanks.

  • sunith992
    sunith992 Dataiku DSS Core Designer, Registered Posts: 20 ✭✭✭✭
    Options

    what 'get_writer' method mean?, where can i get more details about it.

    also why should i use this writer? , indeed it worked without get_writer and used write_with_schema function

Setup Info
    Tags
      Help me…