Out of Memory Issue in Python Recipe.

Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
edited July 2024 in Using Dataiku

Hi @AlexT
,

I am getting out of memory issue while joining two dataframes using python recipe, one is having 5M records and other is having 44k records.

Error message:-

The Python process died (killed - maybe out of memory 

Kindly Help.

Regards,

Ankur.

Answers

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,270 Dataiker

    Hi @Ankur30
    ,

    Doing large joins pandas is not usually recommended. The best approach here would be to use an SQL database and a Visual Join recipe in DSS and leverage SQL engine to perform joins.

    Sync the datasets to SQL datasets and then perform the join. If you must perform this in Pandas you will need to ensure you have enough memory to perform this either:

    1) You may need to adjust cgroups and/or add more RAM to your instance

    2) You may need to use a larger container configuration ( more memory)

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.