Using swap memory on Design Node

tgb417 · ‎09-05-2021

I'm working with a small DSS Design Node with 16GB RAM memory on AWS.

The models I'm trying to build are running out of Available Memory and crashing. I can cut the size of the sample I'm working with. However, to get this to work, I'm using only 1/3 of 1% (0.3333%) of the data in my model-building process. (data set size ranges from 300,000 to 1,000,000 records) With this model, I can build incremental models adding a different sample of records each time I run the model. However, it is clear to me that this is not typical best practice.

So I need some more working memory. I'm working with a non-profit so budget is a consideration.

Has anyone used swap memory for data science using a Python Model? Does anyone have any good news stories or horror stories where swap did and did not work?

Thanks for any insights you can share.

--Tom

Sadig_Mammadov · ‎03-31-2022

Hey, You can use "chunksize" option for python recipes and so split into pieces your dataframe during the processing and optimize memory usage. for example:

for df in source_data.iter_dataframes(chunksize=3000000):

https://doc.dataiku.com/dss/latest/python-api/datasets-data.html

Sign up to take part

Using swap memory on Design Node

Using swap memory on Design Node