How to access jupyter notebooks setup for Dataiku?

TehMP
TehMP Registered Posts: 3 ✭✭✭✭
Hi, I am working with R jupyter notebooks when preparing Flow steps. Unfortunately, due to size of data my notebook keeps crashing. Where can I find jupyter configuration and log files of the notebooks that are used in my workflows?
Tagged:

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,



    You can find the ipython log file in the DSS administration screen, under Maintenance > Log files > ipython.log. Alternatively, if you have access to the server by command line, the file is located in <DSS_DATA_DIR>/run/ipython.log.



    Having said that, it is a bit surprising to get a memory crash on a 1GB csv file. Assuming you use pandas to load and transform it, a good rule of thumb is to have 5-10GB of free memory (see http://wesmckinney.com/blog/apache-arrow-pandas-internals/). Have you checked that you are not printing too much to the notebook output? [EDIT] Sorry, I had read too quickly and did not notice that you were using R. Could you check what is the object.size of the CSV file after it is loaded into an R object?



    Are you at liberty to share your code and underlying data?



    Cheers,



    Alex

  • TehMP
    TehMP Registered Posts: 3 ✭✭✭✭
    Sure, tried for fun some forecasting analysis by rerunning the following code in dataiku R notebook https://www.kaggle.com/merckel/preliminary-investigation-holtwinters-arima/data.

    I found that crashing part is the melt of forecast results (probably uniqe() is too much to handle here):

    ```
    meltX <- melt(<BR /> X[, which(names(X) %in% c(unique(keys$Date), "Page")), with = FALSE],
    measure.vars = unique(keys$Date),
    variable.name = "Date",
    value.name = "Visits")
    meltX$Date <- as.character(meltX$Date)<BR />```
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Which package does the melt function come from?
Setup Info
    Tags
      Help me…