Add a column in dataset and update that column in a incremental way

Options
deep_215
deep_215 Registered Posts: 9 ✭✭✭

Hi

I am stuck in a situation where i need a column that has to get added in the dataset and that specific column will get updated incrementally based on its value in last row.

for e.g.

input table

Column 1Column 2Column 3
1ABC1/12/2022
2DEF3/7/2022

Here I want to add a new column say column 4 that will get incremented based on column 3 value for 4 times.

i.e. column 3 in in date format and it will act as base value for column 4.

Output dataset

column 1column 2column 3column 4
1ABC1/12/20222/12/2022
1ABC1/12/20223/12/2022
1ABC1/12/20224/12/2022
1ABC1/12/20225/12/2022
2DEF3/7/20224/7/2022
2DEF3/7/20225/7/2022
2DEF3/7/20226/7/2022
2DEF3/7/20227/7/2022

in this if you observe for 1, column 3 has 1/12/2022 value and column 4 is incremented by 1 month each time, hence giving 4 rows for 1 .

Thanks in advance

Tagged:

Answers

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
    edited July 17
    Options

    Hi,

    I would do this with a custom python step (or create a plugin).

    Something like the following although you'll have to adjust the format for dates.

    Or you can create a new column with 'i' - being 1,2,3,4 - and use the 'increment date' Step): 

    Screen Shot 2022-04-29 at 11.27.31 AM.png

    def process(row):
        # Define parameters
        num_rows=4
        field_to_increment='price_first_item_purchased'
    
        ret = []
        for i in range (num_rows):
        
            row['new_column']=float(row[field_to_increment])+1
            newrow=dict(row)
            ret.append(newrow)
        
        return ret

Setup Info
    Tags
      Help me…