Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi all,
I'm trying to learn Python for Dataiku DSS but I have a problem with a line of code. I want to merge two dataset with Python. I understood how to merge two dataset with one column in common:
# - * - coding: utf-8 - * -
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
co2_and_Oil = dataiku.Dataset ("CO2_and_Oil")
co2_and_Oil_df = co2_and_Oil.get_dataframe ()
meat_and_Egg_Production = dataiku.Dataset ("Meat_and_Egg_Production")
meat_and_Egg_Production_df = meat_and_Egg_Production.get_dataframe ()
#dataframe definition's
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", right_on = "Entity")
merge_Python = dataiku.Dataset ("Merge_Python")
merge_Python.write_with_schema (merge_Python_df)
but how does it work when the datasets need to be merge with multiple columns in commun? I tried this:
#dataframe definition
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = "Entity", "Code", "Year", right_on = "Entity", "Code", "Year")
but it doesn't work, could you help me?
Cordially,
Hello!
When you use python in DSS, it's just regular python, so DSS isn't doing anything special to your code, so this is actually just a python question.
That said, you'll want to throw all your merge keys into lists, like below:
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = ["Entity", "Code", "Year"], right_on = ["Entity", "Code", "Year"])
There's also a helpful stack overflow article on this: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns
Let me know if you have any additional questions.
Best,
Katie
Hello!
When you use python in DSS, it's just regular python, so DSS isn't doing anything special to your code, so this is actually just a python question.
That said, you'll want to throw all your merge keys into lists, like below:
merge_Python_df = co2_and_Oil_df.merge (meat_and_Egg_Production_df, left_on = ["Entity", "Code", "Year"], right_on = ["Entity", "Code", "Year"])
There's also a helpful stack overflow article on this: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns
Let me know if you have any additional questions.
Best,
Katie