Survey banner
Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

Replace some values in a column with previous values from another column

Solved!
ttoropov
Level 2
Replace some values in a column with previous values from another column

Good day!

I am trying to replace some of the values in one column (empty ones) with values from another column but from previous row. Can't find a solution.

For example: I have columns "A" and "B". What should I do, if I need to replace empty cells A3, A7, A20 etc (where 3, 7, 20 are the numbers of rows from the top) with the values from B2, B6, B19 etc cells?

What would you recommend?

 

 

 

0 Kudos
1 Solution

This can be done with a Window recipe. Assuming you have a dataset like this:

image.png

Create a Window recipe and set the Window Definition. You need to have a sort order to fetch the previous value so pick the column(s) that has your rows ordered properly:

image.png

Then set the aggregation to section to calculate the lag of the row (previous). You can also use the lead (next) if you want:

image.png

Run the recipe and you will have the output like this:

image.png

Now simply replace the value of A by B in a Prepare recipe using whatever logic you want.

 

 

 

 

 

View solution in original post

8 Replies
Turribeach

Where is your dataset stored? (which technology). Are you looking for a coders solution (Python recipe) or a clickers solution (Visual recipe)?

0 Kudos
ttoropov
Level 2
Author

Hello, right, sorry. Forgot to mention.

I'm working in a prepare recipe.

0 Kudos

Where is your dataset stored? (which technology)

0 Kudos
ttoropov
Level 2
Author

Sorry, It's a Server's Filesystem (in csv format), if that's what you are asking for.

0 Kudos

Yes, thanks. I wanted to check if SQL was available as this will be much simpler in SQL. 

0 Kudos

This can be done with a Window recipe. Assuming you have a dataset like this:

image.png

Create a Window recipe and set the Window Definition. You need to have a sort order to fetch the previous value so pick the column(s) that has your rows ordered properly:

image.png

Then set the aggregation to section to calculate the lag of the row (previous). You can also use the lead (next) if you want:

image.png

Run the recipe and you will have the output like this:

image.png

Now simply replace the value of A by B in a Prepare recipe using whatever logic you want.

 

 

 

 

 

ttoropov
Level 2
Author

Good idea! Thank you very much!

0 Kudos
klein
Level 1

Thank you for the solution. It was exactly what I was looking for. It took me a while to find this page though.

0 Kudos