Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on April 27, 2022 8:14AM
Likes: 0
Replies: 7
I have to generate a date row whose conditions are as follows ,
initial condition - datetimeadd([date],1,"days")
condition expression - newrow <= DateTimeAdd([date],365,"days")
looping condition - DateTimeAdd(newrow,1,"days")
Here, date is already existing row .In alteryx we created this using generate rows tool , in Dataiku is it possible to create this using windows , if so please guide me . Or else which recipe should choose for this one .
Hi,
I answered your other post, which seems to be about the same challenge.
Welcome to the Dataiku Community.
From your post, I'm also not exactly clear what you are trying to achieve. It sounds like you may want to create a "date table" for use in reporting, so you don't miss any dates in your final output data set.
I don't recognize the language or syntax you are using above. It looks like it might be a set of Alteryx's function in their functional language. Dataiku has a function language, however it is different than Alteryx's language.
If it turns out that I've guessed correctly and you are trying to create a date table. You might find the following helpful.
If I was trying to create a data table I'd likely create a quick python recipe. Something like this might do the job.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
def create_date_table2(start='2000-01-01', end='2050-12-31'):
df = pd.DataFrame({"Date": pd.date_range(start, end)})
df["Day"] = df.Date.dt.weekday_name
df["Week"] = df.Date.dt.weekofyear
df["Quarter"] = df.Date.dt.quarter
df["Year"] = df.Date.dt.year
df["Year_half"] = (df.Quarter + 1) // 2
return df
# Thanks to this Stack Overflow post for the code above
# https://stackoverflow.com/questions/47150709/how-to-create-a-calendar-table-date-dimension-in-pandas
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Compute recipe outputs
DateTable_df = create_date_table2()
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Write recipe outputs
DateTable = dataiku.Dataset("DateTable")
DateTable.write_with_schema(DateTable_df)
If I were using a SQL database under Dataiku DSS, I might use a recursive CTE to produce a date table. Finally depending on why I needed the data table, For example if I were doing time series calculations, I might use the time series plugin. There is a tutorial on using the time series plugin to prep your data. https://academy.dataiku.com/path/ml-practitioner/time-series-preparation-1
Hope one of those ideas is useful to you. If I've completely missed the mark with these few ideas; Please do come back with a more general description of your use case maybe describe without Alteryx code. Maybe giving a sense of the initial condition of the data and the hoped for end condition of the data. Someone here should be able to help.
--Tom
Hi ,
Thanks for your elaborate response . Actually I am migrating workflow from Alteryx to Dataiku , there I had challenge in generating a new column for each row based on existing column . In Alteryx we used generate row tool and I have already mentioned the conditions we used in Alteryx . Here in Dataiku I am trying to achieve this by using windows recipe computed column , but I am clueless how to execute this same function .
So from your most recent note above it sounds like you are just adding a new column based on data in the current row with no reference to any other rows in your dataset. Is that a correct understanding of what you are trying to do? If that is the case the window recipie is not going to be of much help; the visual prepare is where to go. If you need to reference other rows then window recipe is where to go.
Hope that helps.
ID | DATE | NEWROW |
existing col | new col to be created | |
123 | 1/1/2021 | 1/2/2022 |
123 | 1/1/2021 | 1/3/2022 |
123 | 1/1/2021 | 1/4/2022 |
123 | 1/1/2021 | 1/5/2022 |
123 | 1/1/2021 | 1/6/2022 |
In the use case you are describing I might create a date table as I have described, and then use a visual join recipe to connect the date table to your original data set. This would create all of the new rows you need.
Just my $0.02. Hope it helps.
P.S. I'd also look at whatever the problem I'm trying to solve and see if there is a way refactor the problem. The repeating of data in the way is often a flag for me that I need to re-consider the problem. Just something to consider.
Hi @shri
,
I’m a bit late to the party here, but I stumbled upon this post from April and decided to poke around my DSS to see what I could come up with. I have good news: you can use the Prepare recipe to generate rows using the forRange function and the ‘fold an array’ processor
The forRange function has the following syntax, forRange(from, to, step, variable v, expression e), and gives you an array as its output. It iterates, starting w/ the ‘from’ value, and increments by ‘step’ while the value is less than ‘to’. At each iteration, it evaluates ‘expression e’ where v is the current value.
Here’s two examples with some sample data
Case 1: same as the use case described above where you create 365 rows, one for each day in the year following a start date
Case 2: Let's level up
Enjoy & I hope this helps!
Ashley