Python Merging with multiple keys

Solved!
DaitakuNapoleon
Level 2
Python Merging with multiple keys

Hi all,

I'm trying to learn Python for Dataiku DSS but I have a problem with a line of code. I want to merge two dataset with Python. I understood how to merge two dataset with one column in common:

# - * - coding: utf-8 - * -
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu


# Read recipe inputs
co2_and_Oil = dataiku.Dataset ("CO2_and_Oil")
co2_and_Oil_df = co2_and_Oil.get_dataframe ()
meat_and_Egg_Production = dataiku.Dataset ("Meat_and_Egg_Production")
meat_and_Egg_Production_df = meat_and_Egg_Production.get_dataframe ()


#dataframe definition's
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", right_on = "Entity")


merge_Python = dataiku.Dataset ("Merge_Python")
merge_Python.write_with_schema (merge_Python_df)


but how does it work when the datasets need to be merge with multiple columns in commun? I tried this:

#dataframe definition
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", "Code", "Year", right_on = "Entity", "Code", "Year")

 

but it doesn't work, could you help me?

Cordially,

0 Kudos
1 Solution
ktgross15
Dataiker

Hello!

When you use python in DSS, it's just regular python, so DSS isn't doing anything special to your code, so this is actually just a python question.

That said, you'll want to throw all your merge keys into lists, like below:

 

merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = ["Entity", "Code", "Year"], right_on = ["Entity", "Code", "Year"])

 

There's also a helpful stack overflow article on this: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns

Let me know if you have any additional questions.

Best,

Katie

 

View solution in original post

0 Kudos
1 Reply
ktgross15
Dataiker

Hello!

When you use python in DSS, it's just regular python, so DSS isn't doing anything special to your code, so this is actually just a python question.

That said, you'll want to throw all your merge keys into lists, like below:

 

merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = ["Entity", "Code", "Year"], right_on = ["Entity", "Code", "Year"])

 

There's also a helpful stack overflow article on this: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns

Let me know if you have any additional questions.

Best,

Katie

 

0 Kudos