Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Calculate Max value across rows to return String

Level 2
Calculate Max value across rows to return String

Hi,

New user without any coding skills trying to work out how to do this using a Visual Recipe (if possible).

I've attached a screenshot which might help explain my goal but essentially I have:

Two columns with categorical strings (String 1 and String 2)

A third column with an integer (Count)

For each string in column 2 I want to find the maximum integer in Column 3 and then return the string from column 1 and copy that into a new column (Correct output) for every row.

Is this possible?

Thanks

 

0 Kudos
1 Reply
Dataiker
Dataiker

Hi @sim0n,

What you're looking to do is indeed possible using 100% Visual Recipes (๐ŸŽ‰ ๐Ÿ™Œ). In a nutshell, it goes like this: 1) create a lookup table that contains the max(Count) for each value in string2 along with the corresponding string1 value, 2) join this lookup table with the original table and ta da!

Here's the details:

1A) Aggregate your dataset using String2 as a group key and max(Count) as your aggregation. Ignore string1. You'll end up with two columns: string_2, and Count_max. Call this table maxcount_bystring2 or something like that so you can tell it apart from the others.

1B) Join the original table and maxcount_bystring2 using an inner join on the following conditions: string_2 = string_2 and Count=Count_max. Call it something like lookuptable. Under "Selected Columns" make sure that the only column coming from your original dataset is string_1. You'll get a nice-looking table with 3 columns:

Your 'lookup' tableYour 'lookup' table2) Use another join tool on the original dataset. Left join lookuptable on string_2=string_2 and prefix all the columns coming from lookuptable with 'output_' so you don't get them confused. Keep string_1, string_2, and Count from the original dataset. Keep output_string1 from your lookup table. and run the recipe.

3) You'll get a dataset that looks like the one in your picture

ta da!ta da!

 

lmk if this works on your dataset in DSS!

Here's what my Flow looks likeHere's what my Flow looks like

 

cheers,

Ashley