(Prepare Recipe) Formula Processor for range values in column of numerical and timestamp format.

gerryleonugroho
Level 3
(Prepare Recipe) Formula Processor for range values in column of numerical and timestamp format.

So, I'm trying to create a column based on another column values, first would be on the numerical range column case I'm having.

As per the following explanation :

column_a
0
1
2
3
4
5
...
100

 

My objective is trying to create another column, let's just call it column_b, I'm aware, I could achieve this also through the mean of the bin recipe processor, but I wish to achieve them, through the formula processor instead.

So column_a consist of values ranging from 0 to 100. And I wish to create another column based on the range of values from the column_a. I've been having problem, since my code wouldn't work, as per the following example:

 

 

if (column_a <= 15, 0,
if(column_a > 16 && column_a <= 30,  1,
if(column_a > 31 && column_a <= 45,  2,
if(column_a > 46 && column_a <= 60,  3,
if(column_a > 61, 4)
))
)

 

 

 I wish to assign value to column_b, per range of values coming from column_a.

- if value less or equal to 15, then assign value of 0

- if value greater then 16, but less or equal to 30, assign 1 and so forth.

And my second problem would still be similar, but this time with the Timestamp format, I wish to achieve similar table output but with different data format. As the following:

timestamp_a
2015-01-01T00:00:00.000Z
2015-01-02T00:00:00.000Z
2015-01-03T00:00:00.000Z
2015-01-04T00:00:00.000Z
2015-01-05T00:00:00.000Z
2015-01-06T00:00:00.000Z
2015-01-07T00:00:00.000Z
....
2020-05-31T00:00:00.000Z 

 

So I wish to achieve somewhat similar output, by assigning a value to another column based on the range value from the timestamp_a, as per the following:

 

 

if (timestamp_a <= “2016-01-01T00:00:00.000Z”, 0, 
if (timestamp_a > “2016-01-01T00:00:00.000Z” && timestamp_a <= “2016-12-31T00:00:00.000Z”,  1,  
if (timestamp_a > “2017-01-01T00:00:00.000Z” && timestamp_a <= “2017-12-31T00:00:00.000Z”,  2,
if (timestamp_a > “2018-01-01T00:00:00.000Z” && timestamp_a <= “2018-12-31T00:00:00.000Z”,  3,
if (timestamp_a > “2019-01-01T00:00:00.000Z”, 4)
))
)

 

 

And yes, I'm aware I could do this using another processor recipes that related to date manipulation, but I simply wish to achieve this through the mean of formula based one instead. Wish to assign new column values, based on the range of 

Thanks for all the input.

 

Cheers,

 

0 Kudos
2 Replies
Liev
Dataiker Alumni

Hi @gerryleonugroho 

There was a small issue in your formula. The last IF statement was missing the else clause, I think there was also a missing closing bracket.

if(column_a <= 15, 0,
   if(column_a > 15 && column_a <= 30,  1,
      if(column_a > 30 && column_a <= 45,  2,
         if(column_a > 45 && column_a <= 60,  3,
            if(column_a > 60, 4, "")
            )
        )
     )
  )

 

An alternative way of doing this, since your conditions are incremental, is to start from the end and work your way backwards. This will lead to a less verbose option, have a look as they produce the same outputs.

if(column_a > 60, 4, 
   if(column_a > 45, 3, 
      if(column_a > 30, 2,
         if(column_a > 15, 1, 0)
         )
      )
  )

 

0 Kudos
gerryleonugroho
Level 3
Author

Hi @Liev, thanks. I also like the second option workaround better, really appreciate it. Would the solution also work on my timestamp data format too?

 

Thanks,

0 Kudos

Labels

?
Labels (4)
A banner prompting to get Dataiku