The Python process died (killed - maybe out of memory ?)

vamsee51
Level 1
The Python process died (killed - maybe out of memory ?)

I have included the screenshots of my python recipe and the errors.

Kindly help!!


Operating system used: Windows 11

0 Kudos
4 Replies
AlexT
Dataiker

Hi @vamsee51,

The recipe was using up to 41GB of memory already.
We know this by looking a the entry  "vmRSSPeakMB" : 41707 in your screenshots. 

You will need to either increase the memory available to execute the recipe, if running locally that may mean adjusting cgroups configuration and/or increasing total memory available to the DSS instance. If running in a container you may larger node types and different containerized execution configs.

The other approach is reduce memory usage of your recipe.

Some ways to achieve this could be:
1) Reduce the sampling size and chunk size if you are using chunked reading.
2) Delete any unused intermediate data frames
3) Chaining several operations to avoid using intermediate data frames in the first place.

https://doc.dataiku.com/dss/latest/python-api/datasets-data.html

thanks

tgb417

@vamsee51 ,

Iโ€™ve run into similar situations with very large processes.  In my case things were not containerized and the swap size on the OS hosting DSS was set to 0.  Which for some use cases may make sense.  (Turning on swap can slow things down if you use it too much.)  However, in our use case this did make a positive improvement.  Without increasing hosting costs.  We also ended up reducing sample size for model building. 

There are a number of guides about tuning memory usage and swap for Linux on the internet.  One of these may help with the details.  

Now that I know you are containerized, please disregard my notes here.  @AlexT suggestion to place a support is a great idea.

--Tom
vamsee51
Level 1
Author

Thank you @AlexT and @tgb417 for your swift response.

I tried to increase the memory usage and tried reducing the sampling and chunk size.

Upon running the recipe its still throwing an error. And it seemed to be the resource control and not the containerized execution. We are currently stuck with the migration process.

I am sharing the additional screenshots for your understanding and I kindly request for your assistance on this!

Thank you!!

0 Kudos
AlexT
Dataiker

HI @vamsee51,
Setting the memory request limit to a very high value does not guarantee the K8s node this pod was assigned has that much memory. You may need to add larger nodes in your actual K8s cluster to accommodate for the memory usage.

You recipe is still using 40+GB or RAM at peak.
I would suggest you submit a support ticket with the job diagnostics to advise further.

https://doc.dataiku.com/dss/latest/troubleshooting/problems/job-fails.html#getting-a-job-diagnosis

https://support.dataiku.com/support/tickets/new