500MB memory limit warning

Options
jonjohnston2003
jonjohnston2003 Registered Posts: 9

Hi, I am processing a large file. I cannot choose the whole dataset due to the 500mb memory limit. Is there a way to increase the memory allocation?

Thanks, Jon

Best Answer

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,713 Neuron
    Options

    Hi, there is no memory limit other that what the OS has available. Are you getting confused with the sample memory in the Explore tab? Please note the Explore tab is just that, a way to explore the data, it's not meant to show all the data hence why you always just sample it. Run a recipe and the output should have all your data. Otherwise please explain clearly what you are doing and where you are getting this memory error. Thanks

  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    Got it. this is helpful. Yes I was on Explore tab and I was trying to sample the whole dataset, almost 1MM record. So the right way to analyze the whole dataset is to run a recipe, not to sample the whole dataset.

  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    One more question, now I am in the data preparation recipe screen. There is still a Sample button and only shows partial of the list. There is a 500MB memory limit reached warning too. Does the recipe process the whole dataset despite the sample memory limit?

  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    Update: I run the recipe, and it only applied on the partial data within the 500MB memory limit. Not sure how to increase this memory limit so that I can run all the dataset.

  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    More updates: I found the configuration under Administration to increase the memory limit. It was set as 500MB by default. I increased it to 2000MB (the Ubuntu server VM has 16 GB located), However, when opening the dataset, it stuck at 520K and 2 min later, it reported with the error message the dss server does not response. The data set has 908K records. So there is a way to increase the memory limit but for some reason DSS cannot open the whole 908K dataset. I am running DSS on a Ubuntu server VM with 16GB memory allocation.

  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    Adding the error message:

    Oops: an unexpected error occurred

    127.0.0.1:45017 failed to respond

    Please see our options for getting help

    HTTP code: 500, type: com.dataiku.dss.shadelib.org.apache.http.NoHttpResponseException
  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    Just check in to see if any one can help here. Does Dataiku limit how many records can be processed?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,713 Neuron
    Options

    What is the point in trying to sample all that data? This is not how you should use Dataiku. You need to analise the data using the different tools that Dataiku gives. For instance look at the Analyze oiption in each column heading in the Explore view. You have an option to sample all values on the whole dataset in a single column there. You can also use the Charts, Statistics and Status tabs to calculate and chart data as needed. A sample is just data, just a sample. You shouldn't try to use Dataiku like a spreadsheet. What particular issue are you trying to address by trying to sample the whole dataset? Where is your dataset stored?

  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    Oh, I think I figured it out. I thought when it shows the sample data in recipe, it will only run on the sample data selected, which is not true. The sample just show how the recipe steps will modify the data. When running the steps of the recipe, it will run on the whole dataset. So in my case, the whole dataset is 908K, The sample shows 160K data (after adding one filter step by value, it even shows 104K reduction). However, this does not mean the recipe will only run on the 160K sample. When running the recipe/steps, Dataiku actually runs on the whole 908K dataset. Is this right?

  • jonjohnston2003
    jonjohnston2003 Registered Posts: 9
    Options

    Thank you so much for bearing with the newbie like me.

Setup Info
    Tags
      Help me…