Comparing two data sets at Column level
Hi All
I am trying to compare two datasets with the same columns and finding out the differences at a column level. The goal is to identify the number of rows(defined by a unique key) that have an exact match at each column.
refer sample data below
Dataset 1
ID | Name | Age | Country |
1 | ABC | 21 | USA |
2 | XYZ | 23 | UK |
3 | DEF | 67 | CHN |
Dataset 2
ID | Name | Age | Country |
1 | ABC | 22 | USA |
2 | XYZ | 23 | UK |
3 | DEF | 67 | SWZ |
Output
Count of ID | |
Name | 3 |
Age | 2 |
Country | 2 |
Thanks
Answers
-
Hi,
This would probably be best handled by writing your own code, whether using Python or R, to perform this operation. In this case, you should include both of these datasets as an input to your code recipe and then store the result as an output dataset. More information about using Python and R recipes in DSS can be found in our documentation here:
https://doc.dataiku.com/dss/latest/code_recipes/python.html
https://doc.dataiku.com/dss/latest/code_recipes/r.html
Thanks,
Andrew