Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi All
I am trying to compare two datasets with the same columns and finding out the differences at a column level. The goal is to identify the number of rows(defined by a unique key) that have an exact match at each column.
refer sample data below
Dataset 1
ID | Name | Age | Country |
1 | ABC | 21 | USA |
2 | XYZ | 23 | UK |
3 | DEF | 67 | CHN |
Dataset 2
ID | Name | Age | Country |
1 | ABC | 22 | USA |
2 | XYZ | 23 | UK |
3 | DEF | 67 | SWZ |
Output
Count of ID | |
Name | 3 |
Age | 2 |
Country | 2 |
Thanks
Hi,
This would probably be best handled by writing your own code, whether using Python or R, to perform this operation. In this case, you should include both of these datasets as an input to your code recipe and then store the result as an output dataset. More information about using Python and R recipes in DSS can be found in our documentation here:
https://doc.dataiku.com/dss/latest/code_recipes/python.html
https://doc.dataiku.com/dss/latest/code_recipes/r.html
Thanks,
Andrew