Add ID for row number in DSS dataframe

ele_f
ele_f Registered Posts: 17 ✭✭✭✭

Hi,

In DSS recipe, is it possible to add a column to a dataframe with an ID indicating the row number ?

I am using R at the moment with the command


data <- tibble::rowid_to_column(data, "ID")

but it would be great to have this function build in in DSS prepare recipe.

Thanks

Answers

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    edited July 17

    Hi,

    In a prepare recipe, you could do that with a Python formula step, provided that your data is not parallelized when processed. If your input dataset is filesystem, try checking the "Preserve ordering" option in your input dataset's Advanced settings, then adding a Python step (in "cell" mode) to your Prepare recipe, with for instance this kind of code:


    count = 0
    def process(row):
    global count
    count = count + 1
    return count

    You can find more info on Data ordering here.

  • AdrienP
    AdrienP Dataiker, Registered Posts: 1 Dataiker

    Since Dataiku V.7, it can be done visually with the prepare recipe: “output file record column” with "enrich record with context" processor

Setup Info
    Tags
      Help me…