Using swap memory on Design Node
I'm working with a small DSS Design Node with 16GB RAM memory on AWS.
The models I'm trying to build are running out of Available Memory and crashing. I can cut the size of the sample I'm working with. However, to get this to work, I'm using only 1/3 of 1% (0.3333%) of the data in my model-building process. (data set size ranges from 300,000 to 1,000,000 records) With this model, I can build incremental models adding a different sample of records each time I run the model. However, it is clear to me that this is not typical best practice.
So I need some more working memory. I'm working with a non-profit so budget is a consideration.
Has anyone used swap memory for data science using a Python Model? Does anyone have any good news stories or horror stories where swap did and did not work?
Thanks for any insights you can share.
Answers
-
Hey, You can use "chunksize" option for python recipes and so split into pieces your dataframe during the processing and optimize memory usage. for example:
for df in source_data.iter_dataframes(chunksize=3000000):
https://doc.dataiku.com/dss/latest/python-api/datasets-data.html