Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

Compare multiple columns and return the most frequent word

Level 1
Compare multiple columns and return the most frequent word


I'd like to write a formula that compares multiple columns and returns the most frequent word. I am aiming to aggregate multiple machine learning models to see if it improves accuracy. As an example, based on the image below the 1st row would return "Handling" as this is the most common word in the SVC, RandForest and LogisticRegression. The 2nd row would return "Handling - Operations" - ignore TicketRootCause as this is the real answer.

I have done this in excel with the formula below but can't find the functions in DSS. Any ideas of how I could do this? Either based off converting the excel function below into DSS or another method?





1 Reply
Level 3

Hi Ollie,

You probably have to add a "Python function" step with "Add a new cell for each row".

Then you can find the most common word with python code. For that you can find different options at (for me the best answer is using collections.Counter).


0 Kudos


Labels (2)
A banner prompting to get Dataiku