Selectively tabulating columns based on name
Hi -
I'm trying to add column values based on the column name containing a specific word.
For example, here is a sample dataset that shows for each company, what the lead source was and count:
Deal Name | Paid-LinkedIn-TopTen | Paid-PPC-Display | Paid-LinkedIn-Trends |
Apple | 3 | 0 | 1 |
IBM | 0 | 1 | 0 |
1 | 1 | 0 |
This is showing that Apple deal originated from 3 people at apple clicking the "top ten" linked in ad, and one person clicking the "Trends" linkedin ad. In the real dataset, there are about 60 different lead sources (columns).
The output I'm seeking is: "For each deal show how many contacts came from linkedin", or more specifically referencing the above data set "For each deal, add together the numbers in every column where the column name contains 'LinkedIn'"
The output from the above table should be:
All LinkedIn Sources | |
Apple | 4 |
IBM | 0 |
1 |
I'd appreciate any pointers of how I could go about trying to derive that.
Thanks
Chris
Best Answer
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
In a prepare recipe, you can use the "fold multiple columns by pattern" processor to transform your data from wide to long format. Use the smart pattern builder to fold only the columns that start with "Paid-Linkedin".
Then you can use a Group recipe with your key (Deal Name), adding the values from the fold value column.
Tell me if this helps.
Best regards
Answers
-
Wow ... brilliant. That does the trick nicely. Thanks so much for the tip.