I am converting an Alteryx workflow to Dataiku and one option in Alteryx is to create multiple rows based on some criteria. I would like to take each row in my data set and copy it , adding a new column called MonthDate that starts with January of this year and goes to the current month so it keeps adding months as the month progresses in the calendar year.
For instance I have 2 rows:
1 Mary Smith
2 Joe Henry
If I choose to start at 2020-05-01 as the date I would see:
1 Mary Smith 2020-05-01
1 Mary Smith 2020-06-01
1 Mary Smith 2020-07-01
1 Mary Smith 2020-08-01
2 Joe Henry 2020-05-01
What is the best way to do this in Dataiku as I don't see anything that would be able to do this. Do I need to use python?
Thanks for all the help in advance!
Hi @aw30, you can go with a full Python recipe in this situation, or you can go "Python lite" by using a Python function within the Prepare recipe in "Rows" mode. See, for example: https://academy.dataiku.com/advanced-prepare-recipe-usage/513260. This can be convenient if you have other Prepare steps to take.
Hi, for this use case you can use a cross join (or cartesian product). In your case you will need your original dataset and a second dataset containing one row per date of interest, so the dataset should look like this:
After that you can open a join recipe, select cross join as join type, and then select your columns of interest. You can find more information on how to setup a join recipe here: https://academy.dataiku.com/visual-recipes-overview/479217
Thank you for the response! I am aware of this option but wondered if there was an option that works on demand instead of having a second dataset with all the dates. I'll mark this as a solution because it is a way to accomplish this. Thank you again!