Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Dynamic column value

Solved!
satyanarayana
Level 1
Level 1
Dynamic column value

There are two columns with object id and state(ex. 1,2,3) in the dataset.

Same object id can have multiple states.

Objective is two pick the greatest state for a particular object id in the distinct visual recipe.

Please let me know how to address this

 

With Regards,

Satya

0 Kudos
1 Solution
Manuel
Dataiker
Dataiker

Hi,

With your current values, your requirement is impossible to meet, because the order of those strings is not the order of your lifecycle. So you need to use values that enable sorting.

Instead of the Distinct recipe, did you try to use the Group recipe? 

  • In the field aggregations, even for string columns, you can use the Max metric (see attached image)
  • So, change your columns values to be "1 Started", "2 Reviewed", ...
  • Then apply the max to that column

I hope this is clear. Good luck.

I hope this is clear now. Good luck

 

 

 

View solution in original post

0 Kudos
5 Replies
Manuel
Dataiker
Dataiker

Hi,

Instead of using Distinct, use the Group recipe, which allows you to indicate the distinct keys, but also the Max (and many other metrics) of the other columns.

I hope this helps.

Best regards

0 Kudos
satyanarayana
Level 1
Level 1
Author

Hi,

 

Thank You for your response.

The scenario is the states are the strings. for ex: a particular object has the different states (started,review in progress, completed,....) in its life cycle.

The object can have 1 state i.e. started - In this case we need to output as started in the distinct recipe for that object.

Another object can have 2 states i.e(started, review in progress) - In this case we need to output as review in progress in the distinct recipe for this object

Hope the above statements give more clarification

With Regards,

Satya

0 Kudos
Manuel
Dataiker
Dataiker

Hi,

The alphabetic order of the current values (started, review, completed, etc) does not match the order of the lifecycle.  If you want to identify the "greatest state", then the values of that column must enabling ordering.  

My suggestion, in a previous Prepare recipe, add a new column with a numerical value for the state (1, started), (2, Review), etc., then in the group by just identify the Max of that column.

I hope this helps.

Best regards

 

0 Kudos
satyanarayana
Level 1
Level 1
Author

Hi,

 

Yes this can be one of the solution, but when the final table is published using the distinct recipe, as there

will be another columns in the dataset.

Whatever alias we have considered i.e 1- started, 2- review in process, 3- approved, 4- rejected.

it should print the final state in the column not the alias.

How to address this.

 

With Regards,

Satya

0 Kudos
Manuel
Dataiker
Dataiker

Hi,

With your current values, your requirement is impossible to meet, because the order of those strings is not the order of your lifecycle. So you need to use values that enable sorting.

Instead of the Distinct recipe, did you try to use the Group recipe? 

  • In the field aggregations, even for string columns, you can use the Max metric (see attached image)
  • So, change your columns values to be "1 Started", "2 Reviewed", ...
  • Then apply the max to that column

I hope this is clear. Good luck.

I hope this is clear now. Good luck

 

 

 

View solution in original post

0 Kudos
A banner prompting to get Dataiku DSS