Add a column in dataset and update that column in a incremental way
Hi
I am stuck in a situation where i need a column that has to get added in the dataset and that specific column will get updated incrementally based on its value in last row.
for e.g.
input table
Column 1 | Column 2 | Column 3 |
1 | ABC | 1/12/2022 |
2 | DEF | 3/7/2022 |
Here I want to add a new column say column 4 that will get incremented based on column 3 value for 4 times.
i.e. column 3 in in date format and it will act as base value for column 4.
Output dataset
column 1 | column 2 | column 3 | column 4 |
1 | ABC | 1/12/2022 | 2/12/2022 |
1 | ABC | 1/12/2022 | 3/12/2022 |
1 | ABC | 1/12/2022 | 4/12/2022 |
1 | ABC | 1/12/2022 | 5/12/2022 |
2 | DEF | 3/7/2022 | 4/7/2022 |
2 | DEF | 3/7/2022 | 5/7/2022 |
2 | DEF | 3/7/2022 | 6/7/2022 |
2 | DEF | 3/7/2022 | 7/7/2022 |
in this if you observe for 1, column 3 has 1/12/2022 value and column 4 is incremented by 1 month each time, hence giving 4 rows for 1 .
Thanks in advance
Answers
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
Hi,
I would do this with a custom python step (or create a plugin).
Something like the following although you'll have to adjust the format for dates.
Or you can create a new column with 'i' - being 1,2,3,4 - and use the 'increment date' Step):
def process(row): # Define parameters num_rows=4 field_to_increment='price_first_item_purchased' ret = [] for i in range (num_rows): row['new_column']=float(row[field_to_increment])+1 newrow=dict(row) ret.append(newrow) return ret