Create a new Date column

Alexรฉ
Level 1
Create a new Date column

Hi!

Would you please tell me how can I create a new date column containing values from a specified start to end date in Dataiku?

Thanks

0 Kudos
5 Replies
tgb417

@Alexรฉ 

Welcome to the Dataiku Community.

I'm not clear what you mean when you say "new date column containing values from a specified start to end date"

Here are some of my questions:

  • Do you want to concatenate the start and end date into a single column? For display purposes?
  • Do you have the Start and end date already in columns and want to do date arithmetic to calculate age or something like that?

Can you clarify a bit more what you are trying to do?

  • What data do you currently have?
  • How would you like to transform that data?
  • Once transformed what would the data ideally look like?
--Tom
Alexรฉ
Level 1
Author

Hi @tgb417 !

thank you!

In fact, I have a start date for instance "2005-01" and an end date "now" and I would like to add a Date column to my dataset containing timestamps from the start date to the end date with format "yyyy-MM".

2005-01

2005-02

2005-03

...........

2021-04

.

tgb417

@Alexรฉ ,

@StanG answer could be useful to you.  However, I have to say, I'm still personally not completely clear what the table you are looking for, looks like.

In reading what you have said I'm envisioning a table like the one below.  Are you wanting to create a table that sort of looks like this?

Start_DateNew_Date_Column
2021-01[2021-01, 2021-02,2021-03, 2021-04]
2021-02[2021-02,2021-03, 2021-04]
2021-03[2021-03, 2021-04]
2021-04[2021-04]

 

And in May of 2021 when "now" would looks like this 2021-05 the table will be like this?

Start_DateNew_Date_Column
2021-01[2021-01, 2021-02,2021-03, 2021-04, 2021-05]
2021-02[2021-02,2021-03, 2021-04, 2021-05]
2021-03[2021-03, 2021-04, 2021-05]
2021-04[2021-04, 2021-05]
2021-05[2021-05] 

 

Based on your description this is what I currently understand. 

What also might be helpful is a bit more context about how you plan to use the data.

Let us know.

--Tom
VitaliyD
Dataiker

Hi Alexรฉ,

It is still a bit confusing what you are trying to achieve. Let's assume you have a dataset that has a Date column in 'yyyy-MM' format and you don't have all Date values from the start date until today in your dataset. Example below.

Value Date Value 2
1 2005-01 1000
2 2006-03 1001
3 2007-03 1002
4 2020-01 1003

 

If you want to modify the dataset to include all Dates from the start Date until today (insert missing Dates into dataset), then one of the ways to achieve this is to use the Python Code recipe.

Screenshot 2021-04-22 at 00.50.28.png

Below is the code I wrote to complete this task.

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd
from datetime import datetime

# Read recipe inputs
dates = dataiku.Dataset("input_dataset") #input dataset
dates_df = dates.get_dataframe()

#sort by date
sorted_df = dates_df.sort_values(by='Date', ascending=True)
#get start date
start_date = sorted_df.iloc[0]['Date']

#generate date range
year_month_range = pd.date_range(start_date, datetime.today().strftime("%Y-%m"), freq='MS')
# create DataFrame
df_year_month_range = pd.DataFrame(year_month_range)
#rename column to merge two datasets
df_year_month_range.rename(columns = {0:'Date'}, inplace = True)
#format date to the same fromat for merging
df_year_month_range['Date'] = df_year_month_range['Date'].dt.strftime('%Y-%m')
#merge two DataFrames
df_merged = pd.merge(df_year_month_range, dates_df, how="left", on=["Date"])

# Write recipe outputs
dates_added = dataiku.Dataset("output_dataset")
dates_added.write_with_schema(df_merged)

The above code reads input dataset, then gets start Date from column Date, then creates date range DataFrame with all Dates from the start Date until today in 'yyyy-MM' format, then merge it with DataFrame created from input dataset, and then saves merged DataFrame into output dataset.

As the result, you will get a dataset that will include all dates. To better understand what has been done, you can find input and output datasets attached in form of CSV files.

I hope this will be useful,

Vitaliy

 

StanG
Dataiker

Hi,
If you have a column with values corresponding to the number of month after your start date, you can create a date column in a prepare recipe with the Formula processor using the formula inc("2005-01", month_number, "month"). To get a specific date format, you can use the Format date with custom format processor and enter yyyy-MM as the Date format.

Input:
month_number
0
1
2

Output:
month_number, date
0, 2005-01
1, 2005-02
2, 2005-03