Community Conundrum 28: News Engagement is live! Read More

How to access jupyter notebooks setup for Dataiku?

Level 1
How to access jupyter notebooks setup for Dataiku?
Hi, I am working with R jupyter notebooks when preparing Flow steps. Unfortunately, due to size of data my notebook keeps crashing. Where can I find jupyter configuration and log files of the notebooks that are used in my workflows?
0 Kudos
3 Replies
Dataiker
Dataiker

Hi,



You can find the ipython log file in the DSS administration screen, under Maintenance > Log files > ipython.log. Alternatively, if you have access to the server by command line, the file is located in <DSS_DATA_DIR>/run/ipython.log.



Having said that, it is a bit surprising to get a memory crash on a 1GB csv file. Assuming you use pandas to load and transform it, a good rule of thumb is to have 5-10GB of free memory (see http://wesmckinney.com/blog/apache-arrow-pandas-internals/). Have you checked that you are not printing too much to the notebook output? [EDIT] Sorry, I had read too quickly and did not notice that you were using R. Could you check what is the object.size of the CSV file after it is loaded into an R object? 



Are you at liberty to share your code and underlying data?



Cheers,



Alex

0 Kudos
Level 1
Author
Sure, tried for fun some forecasting analysis by rerunning the following code in dataiku R notebook https://www.kaggle.com/merckel/preliminary-investigation-holtwinters-arima/data.

I found that crashing part is the melt of forecast results (probably uniqe() is too much to handle here):

```
meltX <- melt(
X[, which(names(X) %in% c(unique(keys$Date), "Page")), with = FALSE],
measure.vars = unique(keys$Date),
variable.name = "Date",
value.name = "Visits")
meltX$Date <- as.character(meltX$Date)
```
0 Kudos
Dataiker
Dataiker
Which package does the melt function come from?
0 Kudos
Labels (3)
A banner prompting to get Dataiku DSS