Python Merging with multiple keys
Hi all,
I'm trying to learn Python for Dataiku DSS but I have a problem with a line of code. I want to merge two dataset with Python. I understood how to merge two dataset with one column in common:
# - * - coding: utf-8 - * - import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs co2_and_Oil = dataiku.Dataset ("CO2_and_Oil") co2_and_Oil_df = co2_and_Oil.get_dataframe () meat_and_Egg_Production = dataiku.Dataset ("Meat_and_Egg_Production") meat_and_Egg_Production_df = meat_and_Egg_Production.get_dataframe () #dataframe definition's merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", right_on = "Entity") merge_Python = dataiku.Dataset ("Merge_Python") merge_Python.write_with_schema (merge_Python_df)
but how does it work when the datasets need to be merge with multiple columns in commun? I tried this:
#dataframe definition merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", "Code", "Year", right_on = "Entity", "Code", "Year")
but it doesn't work, could you help me?
Cordially,
Best Answer
-
Hello!
When you use python in DSS, it's just regular python, so DSS isn't doing anything special to your code, so this is actually just a python question.
That said, you'll want to throw all your merge keys into lists, like below:
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = ["Entity", "Code", "Year"], right_on = ["Entity", "Code", "Year"])
There's also a helpful stack overflow article on this: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns
Let me know if you have any additional questions.
Best,
Katie