I'm trying to learn Python for Dataiku DSS but I have a problem with a line of code. I want to merge two dataset with Python. I understood how to merge two dataset with one column in common:
# - * - coding: utf-8 - * -
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
co2_and_Oil = dataiku.Dataset ("CO2_and_Oil")
co2_and_Oil_df = co2_and_Oil.get_dataframe ()
meat_and_Egg_Production = dataiku.Dataset ("Meat_and_Egg_Production")
meat_and_Egg_Production_df = meat_and_Egg_Production.get_dataframe ()
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", right_on = "Entity")
merge_Python = dataiku.Dataset ("Merge_Python")
but how does it work when the datasets need to be merge with multiple columns in commun? I tried this:
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", "Code", "Year", right_on = "Entity", "Code", "Year")
but it doesn't work, could you help me?
When you use python in DSS, it's just regular python, so DSS isn't doing anything special to your code, so this is actually just a python question.
That said, you'll want to throw all your merge keys into lists, like below:
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = ["Entity", "Code", "Year"], right_on = ["Entity", "Code", "Year"])
There's also a helpful stack overflow article on this: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns
Let me know if you have any additional questions.