Python Merging with multiple keys

DaitakuNapoleon
DaitakuNapoleon Registered Posts: 9 ✭✭✭✭
edited July 16 in Using Dataiku

Hi all,

I'm trying to learn Python for Dataiku DSS but I have a problem with a line of code. I want to merge two dataset with Python. I understood how to merge two dataset with one column in common:

# - * - coding: utf-8 - * -
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu


# Read recipe inputs
co2_and_Oil = dataiku.Dataset ("CO2_and_Oil")
co2_and_Oil_df = co2_and_Oil.get_dataframe ()
meat_and_Egg_Production = dataiku.Dataset ("Meat_and_Egg_Production")
meat_and_Egg_Production_df = meat_and_Egg_Production.get_dataframe ()


#dataframe definition's
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", right_on = "Entity")


merge_Python = dataiku.Dataset ("Merge_Python")
merge_Python.write_with_schema (merge_Python_df)


but how does it work when the datasets need to be merge with multiple columns in commun? I tried this:

#dataframe definition
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", "Code", "Year", right_on = "Entity", "Code", "Year")

but it doesn't work, could you help me?

Cordially,

Best Answer

  • Katie
    Katie Dataiker, Registered, Product Ideas Manager Posts: 106 Dataiker
    edited July 17 Answer ✓

    Hello!

    When you use python in DSS, it's just regular python, so DSS isn't doing anything special to your code, so this is actually just a python question.

    That said, you'll want to throw all your merge keys into lists, like below:

    merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = ["Entity", "Code", "Year"], right_on = ["Entity", "Code", "Year"])

    There's also a helpful stack overflow article on this: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns

    Let me know if you have any additional questions.

    Best,

    Katie

Setup Info
    Tags
      Help me…