Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

R in DataIku Tutorial

Solved!
webzest
Level 2
R in DataIku Tutorial

Hello,

I am working through the DataIku R Tutorial and ran into some issues.  I Installed R 3.6 on my UBUTU 20 VM, utilizing the following directions to make sure I was installing the correct supported version of R:  https://linuxconfig.org/how-to-install-r-on-ubuntu-20-04

When I am running through the Tutorial, it is failing from the first R code section that is generating the orders_by_customers variable or csv output.

I noticed a RFSONIO package error in the report, but it is not available from the deb repository that I am using.  (E: Unable to locate package r-cran-rjsonio)

Here is the error report from the Tutorial:

[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - ----------------------------------------
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS startup: jek version:8.0.0
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS home: /home/johnny/Documents/dataiku/dataml
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - OS: Linux 5.4.0-42-generic amd64 - Java: Ubuntu 11.0.8
[07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Allocated a slot for this activity!
[07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Run activity
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default pre-activity lifecycle hook
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checking if sources are ready
[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checked source readiness DKU_TUTORIAL_R.orders -> true
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Computing hashes to propagate BEFORE activity
[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recorded 1 hashes before activity run
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Building recipe runner of type
[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Filling source sizes
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Done filling source sizes
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recipe runner built, will use 1 thread(s)
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Starting execution thread: com.dataiku.dip.dataflow.exec.r.RRecipeRunner@53af4105
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Execution threads started, waiting for activity end
[07:25:23] [INFO] [dku.flow.activity] - Run thread for activity compute_orders_by_customer_NP starting
[07:25:23] [INFO] [dku.venv.selector] - Select code env lang=R projectSelection={"mode":"INHERIT","preventOverride":false} globalDefault=null
[07:25:23] [INFO] [dku.flow.R] - Starting execution of user's R code
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"connection":"filesystem_managed","path":"DKU_TUTORIAL_R/orders_by_customer","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [WARN] [dku.fs.local] - File does not exist: /home/johnny/Documents/dataiku/dataml/managed_datasets/DKU_TUTORIAL_R/orders_by_customer
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM DKU_TUTORIAL_R is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]}
[07:25:23] [INFO] [dku.code.projectLibs] - chunkFolder is /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R
[07:25:23] [INFO] [xxx] - RSRC PATH: ["/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R"]
[07:25:23] [INFO] [dku.recipes.code.base] - Writing dku-exec-env for local execution in /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/remote-run-env-def.json
[07:25:23] [INFO] [dku.code.envs.resolution] - Executing R activity in builtin env
[07:25:23] [INFO] [dku.recipes.code.r] - Execute activity command: ["/usr/bin/R","--quiet","--no-save","--args","/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"]
[07:25:23] [INFO] [dku.recipes.code.base] - Run command insecurely, from user johnny
[07:25:23] [INFO] [dku.security.process] - Starting process (regular)
[07:25:23] [INFO] [dku.security.process] - Process started with pid=45571
[07:25:23] [INFO] [dku.processes.cgroups] - Will use cgroups []
[07:25:23] [INFO] [dku.processes.cgroups] - Applying rules to used cgroups: []
[07:25:23] [INFO] [dku.recipes.code.base] - Process reads from nothing
[07:25:23] [INFO] [dku.resourceusage] - Reporting start of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}
[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting start of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}
[07:25:23] [INFO] [dku.utils]  - > options(echo=T)
[07:25:23] [INFO] [dku.utils]  - > args <- commandArgs(TRUE);
[07:25:23] [INFO] [dku.utils]  - > print (paste("Executing R script: ", args));
[07:25:23] [INFO] [dku.utils]  - [1] "Executing R script:  /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"
[07:25:23] [INFO] [dku.utils]  - > 
[07:25:23] [INFO] [dku.utils]  - > runsRemotely = FALSE;
[07:25:23] [INFO] [dku.utils]  - > jobCwd = NULL;
[07:25:23] [INFO] [dku.utils]  - > dkuExecEnv = NULL;
[07:25:23] [INFO] [dku.utils]  - > scriptFile = args[1]
[07:25:23] [INFO] [dku.utils]  - > if (file.exists("remote-run-env-def.json")) {
[07:25:23] [INFO] [dku.utils]  - +     library("RJSONIO");
[07:25:23] [INFO] [dku.utils]  - +     dkuExecEnv = fromJSON(file("remote-run-env-def.json"))
[07:25:23] [INFO] [dku.utils]  - +     runsRemotely = dkuExecEnv$runsRemotely
[07:25:23] [INFO] [dku.utils]  - + }
[07:25:23] [INFO] [dku.utils]  - Error in library("RJSONIO") : there is no package called ‘RJSONIO’
[07:25:23] [INFO] [dku.utils]  - Execution halted
[07:25:23] [WARN] [dku.resource]  - stat file for pid 45571 does not exist. Process died?
[07:25:23] [INFO] [dku.usage.computeresource.jek]  - Reporting update of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [DEBUG] [dku.resource]  - Process stats for pid 45571: {"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}
[07:25:23] [WARN] [dku.resource] - stat file for pid 45571 does not exist. Process died?
[07:25:23] [INFO] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting completion of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"endTime":1596021923679,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [INFO] [dku.flow.activity] - Run thread failed for activity compute_orders_by_customer_NP
com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
	at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)
	at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - activity is finished
[07:25:23] [ERROR] [dku.flow.activity] running compute_orders_by_customer_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
	at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)
	at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default post-activity lifecycle hook
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Removing samples for DKU_TUTORIAL_R.orders_by_customer
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Done post-activity tasks

 

4 Solutions
sergeyd
Dataiker

Hi, 

You have installed R globally to the system. For R to work in DSS you need to perform R integration. Please follow this link to do this:

https://doc.dataiku.com/dss/latest/installation/r.html#case-1-automatic-installation-if-your-dss-ser...

 

View solution in original post

sergeyd
Dataiker

Looks like you have installed R v4.0. To double-check this, in R notebook please run this:

R.Version()

DSS doesn't support R version 4.0. You need to install 3.6. 

View solution in original post

0 Kudos
sergeyd
Dataiker

@webzest I see that you opened another thread here: https://community.dataiku.com/t5/Using-Dataiku-DSS/DATAIKU-R-Package-is-older-and-failing-to-load-in...

Basically, there are several options to install R in DSS:

1) Install it manually as you did and then run ./bin/dssadmin install-R-integration

2) Use DSS deps installer: dataiku-dss-VERSION/scripts/install/install-deps.sh -check -without-java -without-python -with-r

In your case (when you have R v4 already installed) you will need to delete R globally (apt-get remove <R_package> or something like that), remove directory <DATA_DIR>/R.lib (the DSS-specific R package library), install R v3.6, re-run ./bin/dssadmin install-R-integration script. 

 

View solution in original post

CoreyS
Dataiker Alumni

Good news @webzest! DSS can now use R 4 with Dataiku 11.1. In order to use R 4, you need to run the R integration procedure with “R” in the PATH pointing to R 4. All code environments then need to be rebuilt. Cloud Stacks setups are still on R 3.6, and will switch to R 4 in DSS 12.

More information about Dataiku 11.1 can be found here: Brand New Features for Modelers, ML Engineers, and Analysts in Dataiku 11.1 and in the release notes.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

View solution in original post

0 Kudos
6 Replies
sergeyd
Dataiker

Hi, 

You have installed R globally to the system. For R to work in DSS you need to perform R integration. Please follow this link to do this:

https://doc.dataiku.com/dss/latest/installation/r.html#case-1-automatic-installation-if-your-dss-ser...

 

webzest
Level 2
Author

Hi,

 

Thank you for the link.  R is now activated as a kernel in the jupyter notebook, however, the dataiku library is now broken and I do not know how to get it reinstalled

 

Error: package or namespace load failed for ‘dataiku’:
 package ‘dataiku’ was installed before R 4.0.0: please re-install it
Traceback:

1. library(dataiku)
2. tryCatch({
 .     attr(package, "LibPath") <- which.lib.loc
 .     ns <- loadNamespace(package, lib.loc)
 .     env <- attachNamespace(ns, pos = pos, deps, exclude, include.only)
 . }, error = function(e) {
 .     P <- if (!is.null(cc <- conditionCall(e))) 
 .         paste(" in", deparse(cc)[1L])
 .     else ""
 .     msg <- gettextf("package or namespace load failed for %s%s:\n %s", 
 .         sQuote(package), P, conditionMessage(e))
 .     if (logical.return) 
 .         message(paste("Error:", msg), domain = NA)
 .     else stop(msg, call. = FALSE, domain = NA)
 . })
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. value[[3L]](cond)
6. stop(msg, call. = FALSE, domain = NA)
0 Kudos
sergeyd
Dataiker

Looks like you have installed R v4.0. To double-check this, in R notebook please run this:

R.Version()

DSS doesn't support R version 4.0. You need to install 3.6. 

0 Kudos
sergeyd
Dataiker

@webzest I see that you opened another thread here: https://community.dataiku.com/t5/Using-Dataiku-DSS/DATAIKU-R-Package-is-older-and-failing-to-load-in...

Basically, there are several options to install R in DSS:

1) Install it manually as you did and then run ./bin/dssadmin install-R-integration

2) Use DSS deps installer: dataiku-dss-VERSION/scripts/install/install-deps.sh -check -without-java -without-python -with-r

In your case (when you have R v4 already installed) you will need to delete R globally (apt-get remove <R_package> or something like that), remove directory <DATA_DIR>/R.lib (the DSS-specific R package library), install R v3.6, re-run ./bin/dssadmin install-R-integration script. 

 

webzest
Level 2
Author

Yes, I was able to follow those steps and resolve the R version issue.

CoreyS
Dataiker Alumni

Good news @webzest! DSS can now use R 4 with Dataiku 11.1. In order to use R 4, you need to run the R integration procedure with “R” in the PATH pointing to R 4. All code environments then need to be rebuilt. Cloud Stacks setups are still on R 3.6, and will switch to R 4 in DSS 12.

More information about Dataiku 11.1 can be found here: Brand New Features for Modelers, ML Engineers, and Analysts in Dataiku 11.1 and in the release notes.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos