Dynamic column value
There are two columns with object id and state(ex. 1,2,3) in the dataset.
Same object id can have multiple states.
Objective is two pick the greatest state for a particular object id in the distinct visual recipe.
Please let me know how to address this
With Regards,
Satya
Best Answer
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
With your current values, your requirement is impossible to meet, because the order of those strings is not the order of your lifecycle. So you need to use values that enable sorting.
Instead of the Distinct recipe, did you try to use the Group recipe?
- In the field aggregations, even for string columns, you can use the Max metric (see attached image)
- So, change your columns values to be "1 Started", "2 Reviewed", ...
- Then apply the max to that column
I hope this is clear. Good luck.
I hope this is clear now. Good luck
Answers
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
Instead of using Distinct, use the Group recipe, which allows you to indicate the distinct keys, but also the Max (and many other metrics) of the other columns.
I hope this helps.
Best regards
-
Hi,
Thank You for your response.
The scenario is the states are the strings. for ex: a particular object has the different states (started,review in progress, completed,....) in its life cycle.
The object can have 1 state i.e. started - In this case we need to output as started in the distinct recipe for that object.
Another object can have 2 states i.e(started, review in progress) - In this case we need to output as review in progress in the distinct recipe for this object
Hope the above statements give more clarification
With Regards,
Satya
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
The alphabetic order of the current values (started, review, completed, etc) does not match the order of the lifecycle. If you want to identify the "greatest state", then the values of that column must enabling ordering.
My suggestion, in a previous Prepare recipe, add a new column with a numerical value for the state (1, started), (2, Review), etc., then in the group by just identify the Max of that column.
I hope this helps.
Best regards
-
Hi,
Yes this can be one of the solution, but when the final table is published using the distinct recipe, as there
will be another columns in the dataset.
Whatever alias we have considered i.e 1- started, 2- review in process, 3- approved, 4- rejected.
it should print the final state in the column not the alias.
How to address this.
With Regards,
Satya