How to concatenate dataframes
Hello, say I have 2 separate dataframes (df) but with similar features (or columns if you prefer) as follow: col1 : name, col2: phone, col3 : price
df1 has 100 rows
df2 has 50 rows
I want to concatenate these 2 dataframes into 1 unique that would hence create a dataframe of 150 rows
In python, this is done through simple codes: for examples :
dfAll = [df1, df2]
frame = pd.concat(dfAll)
How to do this in DSS. I do not see any way to concatenate files, though there are different possibilities to join (which is a bit different)
Thanks
Best Answer
-
Hi,
You can use a stack recipe to concatenate two or more datasets vertically. Please refer to this article for more information on how to use it:
https://knowledge.dataiku.com/latest/courses/visual-recipes/stack.html
Best regards,
Alexandre
Answers
-
LaurentS Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 21 ✭✭✭✭
HI and thanks. I did not notice this recipe. Kindest regards
-
I think stack option does not work when we want to concatenate two dataframes horizontally having different columns names. The union option can achieve this purpose but then it generates empty cells.
I am not sure if we can achieve similar things given in this example without generating empty cells:
https://stackoverflow.com/questions/44723377/pandas-combining-two-dataframes-horizontally
The only option I see is to use a Jupyter notebook to concatenate two dataframes horizontally without creating empty cells
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Welcome to the Dataiku Community. Glad to have you here.
I'm not aware of a visual recipe that will do exactly this step in a dedicated visual-only way. That said we do have full access to Python Recipes in Dataiku. You have some code in the Stack Overflow article. You might take a look at this Dataiku academy lesson about using Python Recipies.
https://academy.dataiku.com/path/wild-code/python-and-dataiku-dss/506831
You would only really need to change this part of the recipe.
# Compute recipe outputs from inputs # TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe # NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc. orders_by_customer_df = orders_df # For this sample code, simply copy input to output
--Tom