Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Parsing & compute time since processors

InesB
Dataiker
Parsing & compute time since processors

Working with dates is crucial for analysts as they offer valuable insights such as trends, patterns, and seasonality. However, plotting information based on dates can pose challenges due to variations in date formats across countries and difficulty in manipulation.

Examples challenges are:

  • Different date def. between countries dd/MM/yyyy vs. MM/dd/yyyy
  • Different format: 01/01/2022 vs. 01-01-2022
  • Hard to manipulate

To address these issues, parsing dates becomes essential. Parsing involves standardizing the date format to facilitate easier manipulation and information extraction. In Dataiku DSS, you can parse dates, creating true date columns that account for format variations and time zones, streamlining the analysis process.

Working with dates in Dataiku is also very easy thanks to the different native processors Dataiku has. In Excel you would typically use formulas but in Dataiku, you can do this in a few clicks. 

 

Excel

Dataiku

Convert to different date format

=TEXT(A2,"mm/dd/yyyy")

Native processor

  •  ‘Parse date’ Recipe and select a format with the Smart date tool

Charts with define granularity (day, months…)

Included without dynamic visualization

Included

  • In the Charts section of the Visual Analysis parts of the Lab, change the granularity in the drop-down menu

Compute Time difference between dates

=FILTER(B5:D15,(C5:C15>=F5)*(C5:C15<=G5),"Nodata")

Native processor

  1.  Parse your date column
  2.  ’Compute difference between two dates’ tool in the Prepare Recipe 

Extract Date Information (days of the week, week of the year…)

=TEXT(A2,”ddd”)

Native processor

  1.  Parse your date column
  2. ‘Extract date Component’ tool in the Prepare Recipe

Remove rows not in Date range

=FILTER(B5:D14,MONTH(B5:B14)=2,"No data")

Native processor

  1.  Parse your date column
  2. ‘Filter rows/cells on date’ tool in the Prepare Recipe

Flag/Filter Holidays, Week-end, Days off

Not possible

Native processor

  1.  Parse your date column
  2. ‘Flag holidays’ tool in the Prepare Recipe

 

But let’s delve deeper into this scenario:

Consider the following example: Suppose we are conducting churn analysis to understand why customers churn and what factors contribute to their decision. To follow along with these steps, refer to the attached "Zip" project. 

  • With the crm_web_data dataset selected, initiate a Prepare recipe from the Actions bar on the right.

The name of the output dataset, crm_web_data_prepared, fits well so just create the recipe.

  • Parse the date:

 

From the birth column header dropdown, select Parse date….

In the “Smart Date” dialog, click Use Date Format to accept the detected format.

 

InesB_0-1717185528530.png

By applying the “Parse date” step to the ”date” column, we have created a new birth_parsed column in our dataset. 

InesB_1-1717185613710.png

 

  • Compute time from date columns:

Beyond merely capturing birth dates, our focus lies in understanding the age demographics of our customers, particularly those who churn. By parsing the birth dates into a new column, termed "birth_parsed," we enable the computation of customer age. This transformation allows us to gain insights into the age distribution of our customer segments and aids in targeted analysis, particularly regarding churn behavior.

We can now use this new birth_parsed column to compute the customer age. 

  • From the birth_parsed column header dropdown, select Compute time since.
  • In the Script, change the “Output time unit” to Years, to better align with our common practice of expressing ages in years rather than days. 
  • Change the name of the “Output column” to Age.
     


InesB_2-1717185636860.png

  • Return to the birth and birth_parsed column header dropdown, and select Delete.
  • Click on the Run button and Update Schema
  • Explore the output dataset. 


InesB_3-1717185715414.png

You have now effectively computed the age of our customers. This information holds significance for our churn analysis, as it enables us to investigate whether certain age groups exhibit higher churn rates compared to others. Such insights are invaluable for understanding the underlying patterns and behaviors driving customer churn. 

Congratulations, you’ve now worked with dates in Dataiku !

 

0 Replies

Labels

?
Labels (1)
A banner prompting to get Dataiku