We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

(Prepare Recipe) Formula Processor for range values in column of numerical and timestamp format.

gerryleonugroho
Level 3
(Prepare Recipe) Formula Processor for range values in column of numerical and timestamp format.

So, I'm trying to create a column based on another column values, first would be on the numerical range column case I'm having.

As per the following explanation :

column_a
0
1
2
3
4
5
...
100

 

My objective is trying to create another column, let's just call it column_b, I'm aware, I could achieve this also through the mean of the bin recipe processor, but I wish to achieve them, through the formula processor instead.

So column_a consist of values ranging from 0 to 100. And I wish to create another column based on the range of values from the column_a. I've been having problem, since my code wouldn't work, as per the following example:

 

 

if (column_a <= 15, 0,
if(column_a > 16 && column_a <= 30,  1,
if(column_a > 31 && column_a <= 45,  2,
if(column_a > 46 && column_a <= 60,  3,
if(column_a > 61, 4)
))
)

 

 

 I wish to assign value to column_b, per range of values coming from column_a.

- if value less or equal to 15, then assign value of 0

- if value greater then 16, but less or equal to 30, assign 1 and so forth.

And my second problem would still be similar, but this time with the Timestamp format, I wish to achieve similar table output but with different data format. As the following:

timestamp_a
2015-01-01T00:00:00.000Z
2015-01-02T00:00:00.000Z
2015-01-03T00:00:00.000Z
2015-01-04T00:00:00.000Z
2015-01-05T00:00:00.000Z
2015-01-06T00:00:00.000Z
2015-01-07T00:00:00.000Z
....
2020-05-31T00:00:00.000Z 

 

So I wish to achieve somewhat similar output, by assigning a value to another column based on the range value from the timestamp_a, as per the following:

 

 

if (timestamp_a <= “2016-01-01T00:00:00.000Z”, 0, 
if (timestamp_a > “2016-01-01T00:00:00.000Z” && timestamp_a <= “2016-12-31T00:00:00.000Z”,  1,  
if (timestamp_a > “2017-01-01T00:00:00.000Z” && timestamp_a <= “2017-12-31T00:00:00.000Z”,  2,
if (timestamp_a > “2018-01-01T00:00:00.000Z” && timestamp_a <= “2018-12-31T00:00:00.000Z”,  3,
if (timestamp_a > “2019-01-01T00:00:00.000Z”, 4)
))
)

 

 

And yes, I'm aware I could do this using another processor recipes that related to date manipulation, but I simply wish to achieve this through the mean of formula based one instead. Wish to assign new column values, based on the range of 

Thanks for all the input.

 

Cheers,

 

0 Kudos
2 Replies
Liev
Dataiker Alumni

Hi @gerryleonugroho 

There was a small issue in your formula. The last IF statement was missing the else clause, I think there was also a missing closing bracket.

if(column_a <= 15, 0,
   if(column_a > 15 && column_a <= 30,  1,
      if(column_a > 30 && column_a <= 45,  2,
         if(column_a > 45 && column_a <= 60,  3,
            if(column_a > 60, 4, "")
            )
        )
     )
  )

 

An alternative way of doing this, since your conditions are incremental, is to start from the end and work your way backwards. This will lead to a less verbose option, have a look as they produce the same outputs.

if(column_a > 60, 4, 
   if(column_a > 45, 3, 
      if(column_a > 30, 2,
         if(column_a > 15, 1, 0)
         )
      )
  )

 

0 Kudos
gerryleonugroho
Level 3
Author

Hi @Liev, thanks. I also like the second option workaround better, really appreciate it. Would the solution also work on my timestamp data format too?

 

Thanks,

0 Kudos

Labels

?
Labels (4)
A banner prompting to get Dataiku DSS