How to concatenate dataframes

Options
LaurentS
LaurentS Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 21 ✭✭✭✭

Hello, say I have 2 separate dataframes (df) but with similar features (or columns if you prefer) as follow: col1 : name, col2: phone, col3 : price

df1 has 100 rows

df2 has 50 rows

I want to concatenate these 2 dataframes into 1 unique that would hence create a dataframe of 150 rows

In python, this is done through simple codes: for examples :

dfAll = [df1, df2]
frame = pd.concat(dfAll)

How to do this in DSS. I do not see any way to concatenate files, though there are different possibilities to join (which is a bit different)

Thanks

Best Answer

Answers

  • LaurentS
    LaurentS Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 21 ✭✭✭✭
    Options

    HI and thanks. I did not notice this recipe. Kindest regards

  • ss
    ss Registered Posts: 1 ✭✭✭
    Options

    I think stack option does not work when we want to concatenate two dataframes horizontally having different columns names. The union option can achieve this purpose but then it generates empty cells.

    I am not sure if we can achieve similar things given in this example without generating empty cells:

    https://stackoverflow.com/questions/44723377/pandas-combining-two-dataframes-horizontally

    The only option I see is to use a Jupyter notebook to concatenate two dataframes horizontally without creating empty cells

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    edited July 17
    Options

    @ss

    Welcome to the Dataiku Community. Glad to have you here.

    I'm not aware of a visual recipe that will do exactly this step in a dedicated visual-only way. That said we do have full access to Python Recipes in Dataiku. You have some code in the Stack Overflow article. You might take a look at this Dataiku academy lesson about using Python Recipies.

    https://academy.dataiku.com/path/wild-code/python-and-dataiku-dss/506831

    You would only really need to change this part of the recipe.

    # Compute recipe outputs from inputs
    # TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
    # NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.
    
    orders_by_customer_df = orders_df # For this sample code, simply copy input to output

    --Tom

Setup Info
    Tags
      Help me…