Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Rules for replacement based on different conditions

Aleksandra
Level 2
Rules for replacement based on different conditions

Hi all,

I need do create rules for update data in one final column by condition of another columns. For example, if value in one column bigger than another or if sum of another column is lower than some cut-off etc. I tryed to configure it by Prepare recipe, but there are no any function for smart replacement (or i didn't find it). 

Does anybody no how to create this kind of rule for replacement?


Operating system used: Windows

0 Kudos
10 Replies
emate
Neuron
Neuron

Hi @Aleksandra 

Have you tried Prepare recipe & formula?

for example if(Col=='X', 1 , 0) or to check if one col is > than the other:

 

If_val.png

 

Thanks

Mateusz

0 Kudos
Aleksandra
Level 2
Author

Hi Mateusz

Yes, i already used it for another calculations. The problem is number of conditions, there are can be a lot of them and them should influence to one result column. If there are one combination of condition in another columns (for example columns B, C) it will one result in column Z, in another combination (columns B,C D) another result also in column Z. And it likely 5-10 combinations of different columns.

0 Kudos
emate
Neuron
Neuron

Hi @Aleksandra 

Can't you just use multiple If conditions? and maybe use and (&&) / or (||) operators? Output column will be the same for multiple conditions

if(A=='B','C', if(A=='D','E', if(A=='F','G', 'blank') ) )

Maybe I just can't visualize the issue, if you could paste here some screen with example?

Other way, you could use python to use if, elif, else etc?

Thanks

Mateusz

0 Kudos
Aleksandra
Level 2
Author

Hi Mateusz

Sorry for delay

I can't use If function (as I think) due to a lot of possible conditions. I want to share example.

We have 5 accounts with clients in them. The result what we need is prioritize client, for example, from 0 to 5 priority and sum of priority for each account should be lower or equal to some target (350). Each client have different criterias for prioritization (Criteria 1-9) and already have original Priority based on them. I need to update (or add new) Priority till the sum of it for each account will be lower or equal to target. 

I attached xls sheets with example for correction. Possible steps for updating of Priority:

Step 1 - Change Priority 5 to 4 if sum of priority for Account > norm

Step 2 - Change Priority 4 to 3 if sum of priority for Account > norm and Criteria 2 < 10

Step 3 - Change Priority 3 to 2 if sum of priority for Account > norm and Criteria 3 = B and Criteria 8 = NP

…...etc till the sum of priority for each Account will be <= norm

We have some similar project with necessary to create this replacement/ calculation. There are can be different number of criterias. Also i want to note that there are usually no dependence of order of steps and priority (i think that we cant create column with additional prioritization for update) AND if after the step sum of priority is correct there are no updating for this account on next steps. Count of account and client are big, so we can't do it manually.

Appreciate for your help. Ready to answer for questions

0 Kudos
emate
Neuron
Neuron

Hi,

Well, that's an intresting one...

I mean, I'm not sure if I get the logic of what are you trying to do:

First, you would like to check if for example  sum of Account_1 > norm, if yes, each client with priority flag = 5 would get 4 and those who has no flag would get 4 also? and then you would like to check the sum of priority again and if its still above you want to move to the step 2? and so on, untill sum < norm? Is it what you are looking for?

If yes, I would start with:

1. With group by create 2 columns for each account (like a table similar to what you have shown in attached screen) (one for sum value for each account and norm) and then join it with orginal datasource to have these for each row (client)

2. Create column C with condition like if a > b then column c = "false"  & if false change for each row priority 5 -> 4 and blanks = 4, remove these 2 columns you just used to check sum value + norm

3. After this prepare recipe use group by again to calculate sum for each account (since you recoded some priority values) and join it once more and with similar logic follow with step 2 only if col c would still be "false" 

4. Repeat for each priority in order from 1 to 9?, I don't think this is efficient, but that would work I guess? I mean the moment col C = True for this client/account you know that you have reached your treshhold, and the next steps would no execute as in prepare recipe you have this condition to change priority only if Col C = False.


I'm not sure if that will work, its just a crazy idea I have of the top of my head

Mateusz

 

0 Kudos
Aleksandra
Level 2
Author

Thanks!

You understand me correct as I see. I think that I hoped to avoid this scheme because there are can be a lot of steps..

I have question, how to stop updating on one specific step? I mean that I need to reduce priority for different number for different account because sum of it can be also different. How can I control it?

Aleksandra

0 Kudos
emate
Neuron
Neuron

Hi,

I mean this is the visualization of my idea, each step is diffrent recipe, so 1st is group by, 2nd is prepare, etc... so I think you would run the whole flow, but since the flow will be build to check sum after each step, it just won't affect the steps where condition is already met. I'm not sure if you get my point or I was clear enough? I didnt test it, so you will have to try it, to see if will work, but I think it should.

Idea.pngIdea.png

0 Kudos
Jurre
Level 4

hi @Aleksandra  ,

For these kind of challenges i depend on formula's combined with temporary helper columns as suggested by @emate . To keep track of what happens where i use zones in de flow. This way it all comes down to think about what needs to be done in what order and what helperinfo can eh..help.  A lot of operations can be combined with formula's, which can make it challenging to debug. For more enthousiastic formula's i use an external editor as DSS's edit window is a bit small for my taste.  

 

0 Kudos
Aleksandra
Level 2
Author

Thanks a lot @Jurre @emate , I think I need to try to use your recommendations before any additional questions. Now it's not really clear for me about last topic, mostly because I don't understand part about DSS's edit window.. Hope that everything will be clear during realization. 

Thanks again!

Aleksandra

0 Kudos
Jurre
Level 4

additional questions are what we are here for @Aleksandra  🙂

please be aware : as my projects are all about migrating protected datasets to new containers with a focus on transparency, i chose to follow a more visually oriented path (hence the use of subzones where parts of a dataset are processed). For customers this is more convenient as they are of a different language group (as in: don't speak 101110). That choice had nothing to do with whether or not it is easier to do extensive ETL work directly in codeprocessors or with visual processors. Anyhow, i like both.

Setup info

?
Tags (2)
A banner prompting to get Dataiku DSS