Community Conundrum 28: News Engagement is live! Read More

Compare multiple columns and return the most frequent word

Level 1
Compare multiple columns and return the most frequent word

Hi,



I'd like to write a formula that compares multiple columns and returns the most frequent word. I am aiming to aggregate multiple machine learning models to see if it improves accuracy. As an example, based on the image below the 1st row would return "Handling" as this is the most common word in the SVC, RandForest and LogisticRegression. The 2nd row would return "Handling - Operations" - ignore TicketRootCause as this is the real answer.



I have done this in excel with the formula below but can't find the functions in DSS. Any ideas of how I could do this? Either based off converting the excel function below into DSS or another method?

 



=INDEX(F2:N2,MODE(MATCH(F2:N2,F2:N2,0)))







Thanks,

Ollie

1 Reply
Level 3
Level 3

Hi Ollie,



You probably have to add a "Python function" step with "Add a new cell for each row".



Then you can find the most common word with python code. For that you can find different options at https://stackoverflow.com/questions/48606406/find-most-frequent-value-in-python-dictionary-value-with-maximum-count (for me the best answer is using collections.Counter).



 

0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS