(Prepare Recipe) Formula Processor for range values in column of numerical and timestamp format.

Gerry Leo
Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
edited July 16 in Using Dataiku

So, I'm trying to create a column based on another column values, first would be on the numerical range column case I'm having.

As per the following explanation :

column_a
0
1
2
3
4
5
...
100

My objective is trying to create another column, let's just call it column_b, I'm aware, I could achieve this also through the mean of the bin recipe processor, but I wish to achieve them, through the formula processor instead.

So column_a consist of values ranging from 0 to 100. And I wish to create another column based on the range of values from the column_a. I've been having problem, since my code wouldn't work, as per the following example:

if (column_a <= 15, 0,
if(column_a > 16 && column_a <= 30,  1,
if(column_a > 31 && column_a <= 45,  2,
if(column_a > 46 && column_a <= 60,  3,
if(column_a > 61, 4)
))
)

I wish to assign value to column_b, per range of values coming from column_a.

- if value less or equal to 15, then assign value of 0

- if value greater then 16, but less or equal to 30, assign 1 and so forth.

And my second problem would still be similar, but this time with the Timestamp format, I wish to achieve similar table output but with different data format. As the following:

timestamp_a
2015-01-01T00:00:00.000Z
2015-01-02T00:00:00.000Z
2015-01-03T00:00:00.000Z
2015-01-04T00:00:00.000Z
2015-01-05T00:00:00.000Z
2015-01-06T00:00:00.000Z
2015-01-07T00:00:00.000Z
....
2020-05-31T00:00:00.000Z

So I wish to achieve somewhat similar output, by assigning a value to another column based on the range value from the timestamp_a, as per the following:

if (timestamp_a <= “2016-01-01T00:00:00.000Z”, 0, 
if (timestamp_a > “2016-01-01T00:00:00.000Z” && timestamp_a <= “2016-12-31T00:00:00.000Z”,  1,  
if (timestamp_a > “2017-01-01T00:00:00.000Z” && timestamp_a <= “2017-12-31T00:00:00.000Z”,  2,
if (timestamp_a > “2018-01-01T00:00:00.000Z” && timestamp_a <= “2018-12-31T00:00:00.000Z”,  3,
if (timestamp_a > “2019-01-01T00:00:00.000Z”, 4)
))
)

And yes, I'm aware I could do this using another processor recipes that related to date manipulation, but I simply wish to achieve this through the mean of formula based one instead. Wish to assign new column values, based on the range of

Thanks for all the input.

Cheers,

Answers

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    edited July 17

    Hi @gerryleonugroho

    There was a small issue in your formula. The last IF statement was missing the else clause, I think there was also a missing closing bracket.

    if(column_a <= 15, 0,
       if(column_a > 15 && column_a <= 30,  1,
          if(column_a > 30 && column_a <= 45,  2,
             if(column_a > 45 && column_a <= 60,  3,
                if(column_a > 60, 4, "")
                )
            )
         )
      )

    An alternative way of doing this, since your conditions are incremental, is to start from the end and work your way backwards. This will lead to a less verbose option, have a look as they produce the same outputs.

    if(column_a > 60, 4, 
       if(column_a > 45, 3, 
          if(column_a > 30, 2,
             if(column_a > 15, 1, 0)
             )
          )
      )

  • Gerry Leo
    Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭

    Hi @Liev
    , thanks. I also like the second option workaround better, really appreciate it. Would the solution also work on my timestamp data format too?

    Thanks,

Setup Info
    Tags
      Help me…