topic Re: Prepare Recipe Formula in Using Dataiku DSS
https://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4486#M3102
<P>Hi,</P><P>Just to be sure I understood your use case. The input dataset will look like the one in the attached screenshot ?By that I mean that it will be made of the 6 columns and you need to both check for the first defined "ColN" value. If there are multiple "ColN" values, you then compare the "ValN" value and the N that needs to be kept is the one with the greater "ValN" ? (and another particular case where all the "ColN" are undefined/unknown")</P><P>You want to produce 2 new columns that will contains for each line the "ColN" and "ValN" values for the computed "N" ?</P>Tue, 04 Feb 2020 15:06:51 GMTMickaelH2020-02-04T15:06:51ZPrepare Recipe Formula
https://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4481#M3101
<P>Hi</P><P>I am trying to extract a Column using 6 related columns, the output column will the input column with the highest associated score if atleast one of the input columns has a value other than (Undefined or Unknown)</P><P>Please refer to the attached excel (Col4 & Val4 are the output columns), is there any way to do this other than the formula step in prepare recipe</P><P> </P><P>Thanks</P><P>VB</P>Tue, 04 Feb 2020 14:16:36 GMThttps://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4481#M3101bhakuniv2020-02-04T14:16:36ZRe: Prepare Recipe Formula
https://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4486#M3102
<P>Hi,</P><P>Just to be sure I understood your use case. The input dataset will look like the one in the attached screenshot ?By that I mean that it will be made of the 6 columns and you need to both check for the first defined "ColN" value. If there are multiple "ColN" values, you then compare the "ValN" value and the N that needs to be kept is the one with the greater "ValN" ? (and another particular case where all the "ColN" are undefined/unknown")</P><P>You want to produce 2 new columns that will contains for each line the "ColN" and "ValN" values for the computed "N" ?</P>Tue, 04 Feb 2020 15:06:51 GMThttps://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4486#M3102MickaelH2020-02-04T15:06:51ZRe: Prepare Recipe Formula
https://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4487#M3103
<P>Hi Michael </P><P>The input dataset will be the first 6 columns always no need to check on the count there</P><P>the output should be the column and the corresponding value( where the value is the highest) however if any of the columns is undefined or unknown it should be ignored unless all 3 columns are undefined or unknown </P><P>the expected output is highlighted in the attachment for each row </P>Tue, 04 Feb 2020 15:34:23 GMThttps://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4487#M3103bhakuniv2020-02-04T15:34:23ZRe: Prepare Recipe Formula
https://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4488#M3104
<P>So to do that, you could use a "<STRONG>Python function</STRONG>" step in a "<STRONG>Prepare recipe</STRONG>".</P><P>You must use the mode "<STRONG>row: return a row for each row</STRONG>".</P><P>This way you will able to write all your logic to compute the "<STRONG>Col4</STRONG>" and "<STRONG>Val4</STRONG>" values.</P><P>You can then add another step in your prepare recipe to remove all the other columns if they are not useful.</P>Tue, 04 Feb 2020 15:49:06 GMThttps://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4488#M3104MickaelH2020-02-04T15:49:06ZRe: Prepare Recipe Formula
https://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4525#M3117
<P>Thanks Mickael, I assume there is no other way to get this done other than a custom formula</P>Wed, 05 Feb 2020 14:21:57 GMThttps://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4525#M3117bhakuniv2020-02-05T14:21:57ZRe: Prepare Recipe Formula
https://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4527#M3118
<P>Hi,</P><P>Since you need to:</P><UL><LI>define some "complex" rules to extract the values</LI><LI>generate 2 columns</LI></UL><P>I think it's the easiest and fastest way to do that.</P><P><SPAN>If I understood correctly your use case but you won't be able to translate that into simpler processors. Maybe you can share the corresponding formula used in the spreadsheet that you used for the screenshot?</SPAN></P><P><SPAN>To illustrate the "python function" that could be used to solve your problem here is an example code snippet:</SPAN></P><LI-CODE lang="python">def process(row):
pairs = [['Col1', 'Val1'], ['Col2', 'Val2'], ['Col3', 'Val3']]
max_val = None
max_col = None
max_val_undef = None # Used when all the column names are undefined/unknown
for column_column_name, value_column_name in pairs:
current_column = get_column_name(column_column_name, row)
current_value = float(row.get(value_column_name, '0'))
if current_column is None:
max_val_undef = max(current_value, max_val_undef)
else:
if max_val is None or current_value > max_val:
max_val = current_value
max_col = current_column
if max_col is None:
computed_column = ''
computed_value = max_val_undef
else:
computed_column = max_col
computed_value = max_val
row['Col4'] = computed_column
row['Val4'] = computed_value
return row
def get_column_name(column_column_name, row):
result = row.get(column_column_name, None)
if result.lower() in ['undefined', 'unknown']:
return None
return result</LI-CODE><P>I hope this helps.</P>Wed, 05 Feb 2020 15:35:59 GMThttps://community.dataiku.com/t5/Using-Dataiku-DSS/Prepare-Recipe-Formula/m-p/4527#M3118MickaelH2020-02-05T15:35:59Z