Out of memory Training job
Hi Team,
We have installed DSS on the Virtual machine and the Virtual machine is not exposed to any Internet Inbound and Outbound calls. We are using Enterprise DSS Version with Licence.
While running the training job with Graph analytics plugin and Quick inbuilt Training session, we have encountered out of memory python error. Also We have checked the server utilization at the time of running the recipe. We found during runtime the usage of Memory and storage touching 100per, sometimes beyond that. I am attaching both the screenshot here.
Can you please advice me quick fix for this issue, what needs to be done to avoid this currently and also in future. Please feel free to add chat to this thread, if you have any questions or you need any more information for investigation.
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
I have definitely run out out of Memory on my DSS Server creating a number of different types of models.
From that experience I know that 16Gb of RAM is not really a lot of memory for a production DSS server. I definitely hear of folks using 64GB, 92GB and the like on their nodes. Data Science can be heavy on memory usage.
Although looking at your two top displays you don't seem to be completely out of memory at the time you took the snapshots. You do appear to be using ~12GB for one of your python threads.
If you can not allocate further memory, can you at lease test on a smaller dataset and see how many records you can process. Although I've played with graph analysis on small data sets. I'm not clear how memory efficient the graph process are in DSS. (I don't remember exactly, however, I suspect that I only played with a few hundred to a few thousand nodes.)
As enterprise customers you might want to open a support ticket with the dataiku support team. Over the years they have been very helpful to me.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,993 Neuron
16 GB is berely enough RAM for a PC yet alone for a server running machine learning software. Be realistic of what you want to achieve with desktop amounts of RAM. Python, pandas and data frames are all great but pandas puts all rows into memory which is not really suitable for large datasets. Look at your code and make sure you are not creating unnecesary dataframes. Also look at this SO answer on how Python deals with memory. Finally if all the above fails look to run/offload your workload into an engine that can support it. One of the big advantages of Dataiku is it's capability to run work loads in most engines out there. So move your data to engine that supports your workload, like GCP's BigQuery.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
If one is willing to swap time for the ability to work with larger data loads, one can install PostgreSQL on desktop class systems and extend the size of the dataset you can process. I've definitely processed hundreds of thousands of records even small millions of rows of data on 8Gb and 16Gb systems. Not great performance results, but if that is what you have it can work for analysis and experimentation.