Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

R in DataIku Tutorial

Level 2
R in DataIku Tutorial

Hello,

I am working through the DataIku R Tutorial and ran into some issues.  I Installed R 3.6 on my UBUTU 20 VM, utilizing the following directions to make sure I was installing the correct supported version of R:  https://linuxconfig.org/how-to-install-r-on-ubuntu-20-04

When I am running through the Tutorial, it is failing from the first R code section that is generating the orders_by_customers variable or csv output.

I noticed a RFSONIO package error in the report, but it is not available from the deb repository that I am using.  (E: Unable to locate package r-cran-rjsonio)

Here is the error report from the Tutorial:

[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - ----------------------------------------
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS startup: jek version:8.0.0
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS home: /home/johnny/Documents/dataiku/dataml
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - OS: Linux 5.4.0-42-generic amd64 - Java: Ubuntu 11.0.8
[07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Allocated a slot for this activity!
[07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Run activity
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default pre-activity lifecycle hook
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checking if sources are ready
[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checked source readiness DKU_TUTORIAL_R.orders -> true
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Computing hashes to propagate BEFORE activity
[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recorded 1 hashes before activity run
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Building recipe runner of type
[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Filling source sizes
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Done filling source sizes
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recipe runner built, will use 1 thread(s)
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Starting execution thread: com.dataiku.dip.dataflow.exec.r.RRecipeRunner@53af4105
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Execution threads started, waiting for activity end
[07:25:23] [INFO] [dku.flow.activity] - Run thread for activity compute_orders_by_customer_NP starting
[07:25:23] [INFO] [dku.venv.selector] - Select code env lang=R projectSelection={"mode":"INHERIT","preventOverride":false} globalDefault=null
[07:25:23] [INFO] [dku.flow.R] - Starting execution of user's R code
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"connection":"filesystem_managed","path":"DKU_TUTORIAL_R/orders_by_customer","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [WARN] [dku.fs.local] - File does not exist: /home/johnny/Documents/dataiku/dataml/managed_datasets/DKU_TUTORIAL_R/orders_by_customer
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM DKU_TUTORIAL_R is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]}
[07:25:23] [INFO] [dku.code.projectLibs] - chunkFolder is /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R
[07:25:23] [INFO] [xxx] - RSRC PATH: ["/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R"]
[07:25:23] [INFO] [dku.recipes.code.base] - Writing dku-exec-env for local execution in /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/remote-run-env-def.json
[07:25:23] [INFO] [dku.code.envs.resolution] - Executing R activity in builtin env
[07:25:23] [INFO] [dku.recipes.code.r] - Execute activity command: ["/usr/bin/R","--quiet","--no-save","--args","/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"]
[07:25:23] [INFO] [dku.recipes.code.base] - Run command insecurely, from user johnny
[07:25:23] [INFO] [dku.security.process] - Starting process (regular)
[07:25:23] [INFO] [dku.security.process] - Process started with pid=45571
[07:25:23] [INFO] [dku.processes.cgroups] - Will use cgroups []
[07:25:23] [INFO] [dku.processes.cgroups] - Applying rules to used cgroups: []
[07:25:23] [INFO] [dku.recipes.code.base] - Process reads from nothing
[07:25:23] [INFO] [dku.resourceusage] - Reporting start of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}
[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting start of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}
[07:25:23] [INFO] [dku.utils]  - > options(echo=T)
[07:25:23] [INFO] [dku.utils]  - > args <- commandArgs(TRUE);
[07:25:23] [INFO] [dku.utils]  - > print (paste("Executing R script: ", args));
[07:25:23] [INFO] [dku.utils]  - [1] "Executing R script:  /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"
[07:25:23] [INFO] [dku.utils]  - > 
[07:25:23] [INFO] [dku.utils]  - > runsRemotely = FALSE;
[07:25:23] [INFO] [dku.utils]  - > jobCwd = NULL;
[07:25:23] [INFO] [dku.utils]  - > dkuExecEnv = NULL;
[07:25:23] [INFO] [dku.utils]  - > scriptFile = args[1]
[07:25:23] [INFO] [dku.utils]  - > if (file.exists("remote-run-env-def.json")) {
[07:25:23] [INFO] [dku.utils]  - +     library("RJSONIO");
[07:25:23] [INFO] [dku.utils]  - +     dkuExecEnv = fromJSON(file("remote-run-env-def.json"))
[07:25:23] [INFO] [dku.utils]  - +     runsRemotely = dkuExecEnv$runsRemotely
[07:25:23] [INFO] [dku.utils]  - + }
[07:25:23] [INFO] [dku.utils]  - Error in library("RJSONIO") : there is no package called ‘RJSONIO’
[07:25:23] [INFO] [dku.utils]  - Execution halted
[07:25:23] [WARN] [dku.resource]  - stat file for pid 45571 does not exist. Process died?
[07:25:23] [INFO] [dku.usage.computeresource.jek]  - Reporting update of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [DEBUG] [dku.resource]  - Process stats for pid 45571: {"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}
[07:25:23] [WARN] [dku.resource] - stat file for pid 45571 does not exist. Process died?
[07:25:23] [INFO] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting completion of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"endTime":1596021923679,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [INFO] [dku.flow.activity] - Run thread failed for activity compute_orders_by_customer_NP
com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
	at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)
	at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - activity is finished
[07:25:23] [ERROR] [dku.flow.activity] running compute_orders_by_customer_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
	at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)
	at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default post-activity lifecycle hook
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Removing samples for DKU_TUTORIAL_R.orders_by_customer
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Done post-activity tasks

 

5 Replies
Dataiker
Dataiker

Hi, 

You have installed R globally to the system. For R to work in DSS you need to perform R integration. Please follow this link to do this:

https://doc.dataiku.com/dss/latest/installation/r.html#case-1-automatic-installation-if-your-dss-ser...

 

Level 2
Author

Hi,

 

Thank you for the link.  R is now activated as a kernel in the jupyter notebook, however, the dataiku library is now broken and I do not know how to get it reinstalled

 

Error: package or namespace load failed for ‘dataiku’:
 package ‘dataiku’ was installed before R 4.0.0: please re-install it
Traceback:

1. library(dataiku)
2. tryCatch({
 .     attr(package, "LibPath") <- which.lib.loc
 .     ns <- loadNamespace(package, lib.loc)
 .     env <- attachNamespace(ns, pos = pos, deps, exclude, include.only)
 . }, error = function(e) {
 .     P <- if (!is.null(cc <- conditionCall(e))) 
 .         paste(" in", deparse(cc)[1L])
 .     else ""
 .     msg <- gettextf("package or namespace load failed for %s%s:\n %s", 
 .         sQuote(package), P, conditionMessage(e))
 .     if (logical.return) 
 .         message(paste("Error:", msg), domain = NA)
 .     else stop(msg, call. = FALSE, domain = NA)
 . })
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. value[[3L]](cond)
6. stop(msg, call. = FALSE, domain = NA)
0 Kudos
Dataiker
Dataiker

Looks like you have installed R v4.0. To double-check this, in R notebook please run this:

R.Version()

DSS doesn't support R version 4.0. You need to install 3.6. 

0 Kudos
Dataiker
Dataiker

@webzest I see that you opened another thread here: https://community.dataiku.com/t5/Using-Dataiku-DSS/DATAIKU-R-Package-is-older-and-failing-to-load-in...

Basically, there are several options to install R in DSS:

1) Install it manually as you did and then run ./bin/dssadmin install-R-integration

2) Use DSS deps installer: dataiku-dss-VERSION/scripts/install/install-deps.sh -check -without-java -without-python -with-r

In your case (when you have R v4 already installed) you will need to delete R globally (apt-get remove <R_package> or something like that), remove directory <DATA_DIR>/R.lib (the DSS-specific R package library), install R v3.6, re-run ./bin/dssadmin install-R-integration script. 

 

0 Kudos
Level 2
Author

Yes, I was able to follow those steps and resolve the R version issue.