Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

How to access jupyter notebooks setup for Dataiku?

TehMP
Level 1
How to access jupyter notebooks setup for Dataiku?
Hi, I am working with R jupyter notebooks when preparing Flow steps. Unfortunately, due to size of data my notebook keeps crashing. Where can I find jupyter configuration and log files of the notebooks that are used in my workflows?
0 Kudos
3 Replies
Alex_Combessie
Dataiker Alumni

Hi,



You can find the ipython log file in the DSS administration screen, under Maintenance > Log files > ipython.log. Alternatively, if you have access to the server by command line, the file is located in <DSS_DATA_DIR>/run/ipython.log.



Having said that, it is a bit surprising to get a memory crash on a 1GB csv file. Assuming you use pandas to load and transform it, a good rule of thumb is to have 5-10GB of free memory (see http://wesmckinney.com/blog/apache-arrow-pandas-internals/). Have you checked that you are not printing too much to the notebook output? [EDIT] Sorry, I had read too quickly and did not notice that you were using R. Could you check what is the object.size of the CSV file after it is loaded into an R object? 



Are you at liberty to share your code and underlying data?



Cheers,



Alex

0 Kudos
TehMP
Level 1
Author
Sure, tried for fun some forecasting analysis by rerunning the following code in dataiku R notebook https://www.kaggle.com/merckel/preliminary-investigation-holtwinters-arima/data.

I found that crashing part is the melt of forecast results (probably uniqe() is too much to handle here):

```
meltX <- melt(
X[, which(names(X) %in% c(unique(keys$Date), "Page")), with = FALSE],
measure.vars = unique(keys$Date),
variable.name = "Date",
value.name = "Visits")
meltX$Date <- as.character(meltX$Date)
```
0 Kudos
Alex_Combessie
Dataiker Alumni
Which package does the melt function come from?
0 Kudos

Labels

?
Labels (3)
A banner prompting to get Dataiku