500MB memory limit warning

Solved!
jonjohnston2003
Level 2
500MB memory limit warning

Hi, I am processing a large file. I cannot choose the whole dataset due to the 500mb memory limit. Is there a way to increase the memory allocation?

Thanks, Jon

0 Kudos
1 Solution

Correct. Thatโ€™s why you donโ€™t need to worry about the sample. 

View solution in original post

11 Replies
Turribeach

Hi, there is no memory limit other that what the OS has available. Are you getting confused with the sample memory in the Explore tab? Please note the Explore tab is just that, a way to explore the data, it's not meant to show all the data hence why you always just sample it. Run a recipe and the output should have all your data. Otherwise please explain clearly what you are doing and where you are getting this memory error. Thanks

0 Kudos
jonjohnston2003
Level 2
Author

Got it. this is helpful. Yes I was on Explore tab and I was trying to sample the whole dataset, almost 1MM record. So the right way to analyze the whole dataset is to run a recipe, not to sample the whole dataset.

0 Kudos
jonjohnston2003
Level 2
Author

One more question, now I am in the data preparation recipe screen. There is still a Sample button and only shows partial of the list. There is a 500MB memory limit reached warning too. Does the recipe process the whole dataset despite the sample memory limit?

0 Kudos
jonjohnston2003
Level 2
Author

Update: I run the recipe, and it only applied on the partial data within the 500MB memory limit. Not sure how to increase this memory limit so that I can run all the dataset.

0 Kudos
jonjohnston2003
Level 2
Author

More updates: I found the configuration under Administration to increase the memory limit. It was set as 500MB by default. I increased it to 2000MB (the Ubuntu server VM has 16 GB located), However, when opening the dataset, it stuck at 520K and 2 min later, it reported with the error message the dss server does not response. The data set has 908K records. So there is a way to increase the memory limit but for some reason DSS cannot open the whole 908K dataset. I am running DSS on a Ubuntu server VM with 16GB memory allocation.

0 Kudos
jonjohnston2003
Level 2
Author

Adding the error message:

 Oops: an unexpected error occurred

127.0.0.1:45017 failed to respond

Please see our options for getting help

HTTP code: 500, type: com.dataiku.dss.shadelib.org.apache.http.NoHttpResponseException
0 Kudos
jonjohnston2003
Level 2
Author

Just check in to see if any one can help here. Does Dataiku limit how many records can be processed?

0 Kudos

What is the point in trying to sample all that data? This is not how you should use Dataiku. You need to analise the data using the different tools that Dataiku gives. For instance look at the Analyze oiption in each column heading in the Explore view. You have an option to sample all values on the whole dataset in a single column there. You can also use the Charts, Statistics and Status tabs to calculate and chart data as needed. A sample is just data, just a sample. You shouldn't try to use Dataiku like a spreadsheet. What particular issue are you trying to address by trying to sample the whole dataset? Where is your dataset stored? 

0 Kudos
jonjohnston2003
Level 2
Author

Oh, I think I figured it out. I thought when it shows the sample data in recipe, it will only run on the sample data selected, which is not true. The sample just show how the recipe steps will modify the data. When running the steps of the recipe, it will run on the whole dataset. So in my case, the whole dataset is 908K, The sample shows 160K data (after adding one filter step by value, it even shows 104K reduction). However, this does not mean the recipe will only run on the 160K sample. When running the recipe/steps, Dataiku actually runs on the whole 908K dataset. Is this right?

0 Kudos

Correct. Thatโ€™s why you donโ€™t need to worry about the sample. 

jonjohnston2003
Level 2
Author

Thank you so much for bearing with the newbie like me.