Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Using swap memory on Design Node

tgb417
Using swap memory on Design Node

I'm working with a small DSS Design Node with 16GB RAM memory on AWS.

The models I'm trying to build are running out of Available Memory and crashing.  I can cut the size of the sample I'm working with.  However, to get this to work, I'm using only 1/3 of 1% (0.3333%) of the data in my model-building process.  (data set size ranges from 300,000 to 1,000,000 records)  With this model, I can build incremental models adding a different sample of records each time I run the model. However, it is clear to me that this is not typical best practice.

So I need some more working memory.  I'm working with a non-profit so budget is a consideration.

Has anyone used swap memory for data science using a Python Model?  Does anyone have any good news stories or horror stories where swap did and did not work?

Thanks for any insights you can share.

--Tom
0 Kudos
1 Reply
Sadig_Mammadov
Level 1

Hey, You can use "chunksize" option for python recipes and so split into pieces your dataframe during the processing and optimize memory usage. for example: 

for df in source_data.iter_dataframes(chunksize=3000000):

 

https://doc.dataiku.com/dss/latest/python-api/datasets-data.html

0 Kudos