Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Add rows to dataset and use it in input next time

Solved!
scholaschl
Level 2
Add rows to dataset and use it in input next time

Hello,


I want to take a dataset in input data, to add rows to this dataset and during the future execution of the scenario, to take in input the dataset with the rows previously added (like a loop).

What would be the best way to proceed?

Thanks in advance,

0 Kudos
1 Solution
nmadhu20

Hey @scholaschl ,
So you want to append one row with each scenario run to an existing dataframe. Is that about right?

There are two ways to solve this in my experience:

  • If you want the loop - You can have a python recipe and declare the output as input. Just create the new row array and append. Make sure for the first build, run the recipe without declaring it as input as it will throw an error since the dataset would be empty.
import dataiku
import numpy as np

dt = dataiku.Dataset('dataset_name')
df = dt.get_dataframe()

#required computation
new_arr = np.array(['col1_value', 'col2_value', 'col3_value']) #assuming it has 3 columns

#finding the last row and appending after that in the existing dataframe
df.loc([len(df)]) = new_arr
dt.write_dataframe(df)
  • If you want to avoid the loop - you can create a recipe with all your computation and select the 'append instead of overwrite' option

image.png

 

 

 

 

Hope it helps!

Regards,

Madhuleena

View solution in original post

2 Replies
nmadhu20

Hey @scholaschl ,
So you want to append one row with each scenario run to an existing dataframe. Is that about right?

There are two ways to solve this in my experience:

  • If you want the loop - You can have a python recipe and declare the output as input. Just create the new row array and append. Make sure for the first build, run the recipe without declaring it as input as it will throw an error since the dataset would be empty.
import dataiku
import numpy as np

dt = dataiku.Dataset('dataset_name')
df = dt.get_dataframe()

#required computation
new_arr = np.array(['col1_value', 'col2_value', 'col3_value']) #assuming it has 3 columns

#finding the last row and appending after that in the existing dataframe
df.loc([len(df)]) = new_arr
dt.write_dataframe(df)
  • If you want to avoid the loop - you can create a recipe with all your computation and select the 'append instead of overwrite' option

image.png

 

 

 

 

Hope it helps!

Regards,

Madhuleena

scholaschl
Level 2
Author

Thank you for your answer! It will help me.