Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

how to generate rows using windows

shri
Level 1
how to generate rows using windows

I have to generate a date row  whose conditions are as follows ,

initial condition - datetimeadd([date],1,"days")

condition expression - newrow <= DateTimeAdd([date],365,"days")

looping condition - DateTimeAdd(newrow,1,"days")

Here, date is  already existing row  .In alteryx we created this using generate rows tool , in Dataiku is it possible to create this using windows , if so please guide me . Or else which recipe should choose for this one .

0 Kudos
6 Replies
Manuel
Dataiker
Dataiker

Hi,

I answered your other post, which seems to be about the same challenge.

 

0 Kudos
tgb417
Neuron
Neuron

@shri 

Welcome to the Dataiku Community.

From your post, I'm also not exactly clear what you are trying to achieve.  It sounds like you may want to create a "date table" for use in reporting, so you don't miss any dates in your final output data set.

I don't recognize the language or syntax you are using above.  It looks like it might be a set of Alteryx's function in their functional language.  Dataiku has a function language, however it is different than Alteryx's language.

If it turns out that I've guessed correctly and you are trying to create a date table. You might find the following helpful.

If I was trying to create a data table I'd likely create a quick python recipe.  Something like this might do the job.

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
def create_date_table2(start='2000-01-01', end='2050-12-31'):
df = pd.DataFrame({"Date": pd.date_range(start, end)})
df["Day"] = df.Date.dt.weekday_name
df["Week"] = df.Date.dt.weekofyear
df["Quarter"] = df.Date.dt.quarter
df["Year"] = df.Date.dt.year
df["Year_half"] = (df.Quarter + 1) // 2
return df
# Thanks to this Stack Overflow post for the code above
# https://stackoverflow.com/questions/47150709/how-to-create-a-calendar-table-date-dimension-in-pandas

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Compute recipe outputs

DateTable_df = create_date_table2()

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Write recipe outputs
DateTable = dataiku.Dataset("DateTable")
DateTable.write_with_schema(DateTable_df)

 If I were using a SQL database under Dataiku DSS, I might use a recursive CTE to produce a date table.  Finally depending on why I needed the data table, For example if I were doing time series calculations, I might use the time series plugin.  There is a tutorial on using the time series plugin to prep your data.  https://academy.dataiku.com/path/ml-practitioner/time-series-preparation-1 

Hope one of those ideas is useful to you. If I've completely missed the mark with these few ideas; Please do come back with a more general description of your use case maybe describe without Alteryx code.   Maybe giving a sense of the initial condition of the data and the hoped for end condition of the data. Someone here should be able to help.

--Tom

--Tom
0 Kudos
shri
Level 1
Author

Hi ,

Thanks for your elaborate response . Actually I am migrating workflow from Alteryx to Dataiku , there I had challenge in generating a new column for each row based on existing column . In Alteryx we used generate row tool and I have already mentioned the conditions we used in Alteryx . Here in Dataiku I am trying to achieve this by using windows recipe  computed column , but I am clueless how to execute this same function .

0 Kudos
tgb417
Neuron
Neuron

@shri 

So from your most recent note above it sounds like you are just adding a new column based on data in the current row with no reference to any other rows in your dataset.  Is that a correct understanding of what you are trying to do?  If that is the case the window recipie is not going to be of much help; the visual prepare is where to go.  If you need to reference other rows then window recipe is where to go.

Hope that helps.   

--Tom
0 Kudos
shri
Level 1
Author
Spoiler
Your understanding is correct , here I am trying to generate a column named NEWROW , for each row of the existing column  DATE . The output should look like this ,

IDDATENEWROW
 existing colnew col to be created
1231/1/20211/2/2022
1231/1/20211/3/2022
1231/1/20211/4/2022
1231/1/20211/5/2022
1231/1/20211/6/2022

I  have to get 365 rows for each ID . Basically its like a for  loop , while I am trying to achieve this using computed column I get error at the <= symbol . Hope this  gives you more  insight about my problem.

 

0 Kudos
tgb417
Neuron
Neuron

@shri 

In the use case you are describing I might create a date table as I have described, and then use a visual join recipe to connect the date table to your original data set.  This would create all of the new rows you need.

Just my $0.02.  Hope it helps.

P.S.  I'd also look at whatever the problem I'm trying to solve and see if there is a way refactor the problem.  The repeating of data in the way is often a flag for me that I need to re-consider the problem.  Just something to consider.

--Tom