How to concatenate dataframes

Solved!
LaurentS
Level 3
How to concatenate dataframes

Hello, say I have 2 separate dataframes (df) but with similar features (or columns if you prefer) as follow: col1 : name, col2: phone, col3 : price

df1 has 100 rows

df2 has 50 rows

I want to concatenate these 2 dataframes into 1 unique that would hence create a dataframe of 150 rows

In python, this is done through simple codes: for examples : 

dfAll = [df1, df2]
frame = pd.concat(dfAll)

How to do this in DSS.  I do not see any way to concatenate files, though there are different possibilities to join (which is a bit different)

 

Thanks

0 Kudos
1 Solution
AlexandreL
Dataiker

Hi, 

You can use a stack recipe to concatenate two or more datasets vertically. Please refer to this article for  more information on how to use it:

 https://knowledge.dataiku.com/latest/courses/visual-recipes/stack.html

Best regards,

Alexandre

View solution in original post

4 Replies
AlexandreL
Dataiker

Hi, 

You can use a stack recipe to concatenate two or more datasets vertically. Please refer to this article for  more information on how to use it:

 https://knowledge.dataiku.com/latest/courses/visual-recipes/stack.html

Best regards,

Alexandre

ss
Level 1

I think stack option does not work when we want to concatenate two dataframes horizontally having different columns names. The union option can achieve this purpose but then it generates empty cells. 

I am not sure if we can achieve similar things given in this example without generating empty cells:

https://stackoverflow.com/questions/44723377/pandas-combining-two-dataframes-horizontally

The only option I see is to use a Jupyter notebook to concatenate two dataframes horizontally without creating empty cells

0 Kudos
tgb417

@ss 

Welcome to the Dataiku Community.  Glad to have you here.

I'm not aware of a visual recipe that will do exactly this step in a dedicated visual-only way.  That said we do have full access to Python Recipes in Dataiku.  You have some code in the Stack Overflow article.  You might take a look at this Dataiku academy lesson about using Python Recipies.

https://academy.dataiku.com/path/wild-code/python-and-dataiku-dss/506831

You would only really need to change this part of the recipe.

# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

orders_by_customer_df = orders_df # For this sample code, simply copy input to output

--Tom

--Tom
0 Kudos
LaurentS
Level 3
Author

HI and thanks.  I did not notice this recipe.  Kindest regards

0 Kudos