Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode:

# min between 2 dates

Solved!
Level 3
###### min between 2 dates

Hello,

I'm trying to find the 1st date between 2 date fields.

I was thinking of using a min formula but there may be missing values ​​in these fields and the formula doesn't seem to work in this case.

Is there another solution other than an "if then" formula ?

1 Solution
Level 3

Hi @EdBerth,

If I understood correctly, based on your latest comment, you need an automated way (minimize user input in case new date columns are added in the dataset) to find the minimum date among multiple date values (not just between two). One way to achieve this is through the Python function step in a Prepare recipe which applies a Python function to each row in the dataset. What the function in the attached screenshot does, is that, per row, it takes the values of all the columns, sorts them, and finally, returns the second element of this list.

Note: The first element of this list will be None because this is the value that the outcome column originally has, since it is a placeholder column and it is empty at the beginning.

I hope this helps!

-Konstantina

4 Replies

Is there another solution other than an "if then" formula ? => Not sure why you think the if approach is not suitable?

``min(if(isNull(val("date_col1")), val("date_col2"), val("date_col1")), if(isNull(val("date_col2")), val("date_col1"), val("date_col2")))``

Level 3
Author
`I am not against using the if approach but it can quickly become complex if there are more than 2 fields in date format.That's why I was looking for another approach.`
Level 3

Hi @EdBerth,

If I understood correctly, based on your latest comment, you need an automated way (minimize user input in case new date columns are added in the dataset) to find the minimum date among multiple date values (not just between two). One way to achieve this is through the Python function step in a Prepare recipe which applies a Python function to each row in the dataset. What the function in the attached screenshot does, is that, per row, it takes the values of all the columns, sorts them, and finally, returns the second element of this list.

Note: The first element of this list will be None because this is the value that the outcome column originally has, since it is a placeholder column and it is empty at the beginning.

I hope this helps!

-Konstantina

Level 2

Hello !

I had the same issue with 5 columns containing dates (the 5 columns might have blanks but there was always at least two filled with dates).

It's not very pretty but here's how I managed it :

min(
if(isBlank(DATE1),asDate("2099-01-01", "yyyy-MM-dd"),asDate(DATE1)),
if(isBlank(DATE2),asDate("2099-01-01", "yyyy-MM-dd"),asDate(DATE2)),
if(isBlank(DATE3),asDate("2099-01-01", "yyyy-MM-dd"),asDate(DATE3)),
if(isBlank(DATE4),asDate("2099-01-01", "yyyy-MM-dd"),asDate(DATE4)),
if(isBlank(DATE5),asDate("2099-01-01", "yyyy-MM-dd"),asDate(DATE5)))

I also need the max so I guess I'll use the same trick but with the date "1900-01-01" as default.

Hope this helps !