Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Using swap memory on Design Node

tgb417
Neuron
Neuron
Using swap memory on Design Node

I'm working with a small DSS Design Node with 16GB RAM memory on AWS.

The models I'm trying to build are running out of Available Memory and crashing.  I can cut the size of the sample I'm working with.  However, to get this to work, I'm using only 1/3 of 1% (0.3333%) of the data in my model-building process.  (data set size ranges from 300,000 to 1,000,000 records)  With this model, I can build incremental models adding a different sample of records each time I run the model. However, it is clear to me that this is not typical best practice.

So I need some more working memory.  I'm working with a non-profit so budget is a consideration.

Has anyone used swap memory for data science using a Python Model?  Does anyone have any good news stories or horror stories where swap did and did not work?

Thanks for any insights you can share.

--Tom
0 Kudos
1 Reply
Sadig_Mammadov
Level 1

Hey, You can use "chunksize" option for python recipes and so split into pieces your dataframe during the processing and optimize memory usage. for example: 

for df in source_data.iter_dataframes(chunksize=3000000):

 

https://doc.dataiku.com/dss/latest/python-api/datasets-data.html

0 Kudos