R in DataIku Tutorial
Hello,
I am working through the DataIku R Tutorial and ran into some issues. I Installed R 3.6 on my UBUTU 20 VM, utilizing the following directions to make sure I was installing the correct supported version of R: https://linuxconfig.org/how-to-install-r-on-ubuntu-20-04
When I am running through the Tutorial, it is failing from the first R code section that is generating the orders_by_customers variable or csv output.
I noticed a RFSONIO package error in the report, but it is not available from the deb repository that I am using. (E: Unable to locate package r-cran-rjsonio)
Here is the error report from the Tutorial:
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - ---------------------------------------- [07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS startup: jek version:8.0.0 [07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS home: /home/johnny/Documents/dataiku/dataml [07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - OS: Linux 5.4.0-42-generic amd64 - Java: Ubuntu 11.0.8 [07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Allocated a slot for this activity! [07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Run activity [07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default pre-activity lifecycle hook [07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checking if sources are ready [07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP [07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix= [07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/ [07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511 [07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA [07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checked source readiness DKU_TUTORIAL_R.orders -> true [07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Computing hashes to propagate BEFORE activity [07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP [07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix= [07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/ [07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511 [07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA [07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recorded 1 hashes before activity run [07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Building recipe runner of type [07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Filling source sizes [07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix= [07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/ [07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511 [07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Done filling source sizes [07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recipe runner built, will use 1 thread(s) [07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Starting execution thread: com.dataiku.dip.dataflow.exec.r.RRecipeRunner@53af4105 [07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Execution threads started, waiting for activity end [07:25:23] [INFO] [dku.flow.activity] - Run thread for activity compute_orders_by_customer_NP starting [07:25:23] [INFO] [dku.venv.selector] - Select code env lang=R projectSelection={"mode":"INHERIT","preventOverride":false} globalDefault=null [07:25:23] [INFO] [dku.flow.R] - Starting execution of user's R code [07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"connection":"filesystem_managed","path":"DKU_TUTORIAL_R/orders_by_customer","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [07:25:23] [WARN] [dku.fs.local] - File does not exist: /home/johnny/Documents/dataiku/dataml/managed_datasets/DKU_TUTORIAL_R/orders_by_customer [07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [07:25:23] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM DKU_TUTORIAL_R is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]} [07:25:23] [INFO] [dku.code.projectLibs] - chunkFolder is /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R [07:25:23] [INFO] [xxx] - RSRC PATH: ["/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R"] [07:25:23] [INFO] [dku.recipes.code.base] - Writing dku-exec-env for local execution in /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/remote-run-env-def.json [07:25:23] [INFO] [dku.code.envs.resolution] - Executing R activity in builtin env [07:25:23] [INFO] [dku.recipes.code.r] - Execute activity command: ["/usr/bin/R","--quiet","--no-save","--args","/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"] [07:25:23] [INFO] [dku.recipes.code.base] - Run command insecurely, from user johnny [07:25:23] [INFO] [dku.security.process] - Starting process (regular) [07:25:23] [INFO] [dku.security.process] - Process started with pid=45571 [07:25:23] [INFO] [dku.processes.cgroups] - Will use cgroups [] [07:25:23] [INFO] [dku.processes.cgroups] - Applying rules to used cgroups: [] [07:25:23] [INFO] [dku.recipes.code.base] - Process reads from nothing [07:25:23] [INFO] [dku.resourceusage] - Reporting start of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}} [07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting start of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}} [07:25:23] [INFO] [dku.utils] - > options(echo=T) [07:25:23] [INFO] [dku.utils] - > args <- commandArgs(TRUE); [07:25:23] [INFO] [dku.utils] - > print (paste("Executing R script: ", args)); [07:25:23] [INFO] [dku.utils] - [1] "Executing R script: /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R" [07:25:23] [INFO] [dku.utils] - > [07:25:23] [INFO] [dku.utils] - > runsRemotely = FALSE; [07:25:23] [INFO] [dku.utils] - > jobCwd = NULL; [07:25:23] [INFO] [dku.utils] - > dkuExecEnv = NULL; [07:25:23] [INFO] [dku.utils] - > scriptFile = args[1] [07:25:23] [INFO] [dku.utils] - > if (file.exists("remote-run-env-def.json")) { [07:25:23] [INFO] [dku.utils] - + library("RJSONIO"); [07:25:23] [INFO] [dku.utils] - + dkuExecEnv = fromJSON(file("remote-run-env-def.json")) [07:25:23] [INFO] [dku.utils] - + runsRemotely = dkuExecEnv$runsRemotely [07:25:23] [INFO] [dku.utils] - + } [07:25:23] [INFO] [dku.utils] - Error in library("RJSONIO") : there is no package called ‘RJSONIO’ [07:25:23] [INFO] [dku.utils] - Execution halted [07:25:23] [WARN] [dku.resource] - stat file for pid 45571 does not exist. Process died? [07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting update of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}} [07:25:23] [DEBUG] [dku.resource] - Process stats for pid 45571: {"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0} [07:25:23] [WARN] [dku.resource] - stat file for pid 45571 does not exist. Process died? [07:25:23] [INFO] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}} [07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting completion of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"endTime":1596021923679,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}} [07:25:23] [INFO] [dku.flow.activity] - Run thread failed for activity compute_orders_by_customer_NP com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103) at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39) at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - activity is finished [07:25:23] [ERROR] [dku.flow.activity] running compute_orders_by_customer_NP - Activity failed com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103) at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39) at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default post-activity lifecycle hook [07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Removing samples for DKU_TUTORIAL_R.orders_by_customer [07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Done post-activity tasks
Best Answers
-
Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
Hi,
You have installed R globally to the system. For R to work in DSS you need to perform R integration. Please follow this link to do this:
-
Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
Looks like you have installed R v4.0. To double-check this, in R notebook please run this:
R.Version()
DSS doesn't support R version 4.0. You need to install 3.6.
-
Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
@webzest
I see that you opened another thread here: https://community.dataiku.com/t5/Using-Dataiku-DSS/DATAIKU-R-Package-is-older-and-failing-to-load-in-DSS/m-p/8739Basically, there are several options to install R in DSS:
1) Install it manually as you did and then run ./bin/dssadmin install-R-integration
2) Use DSS deps installer: dataiku-dss-VERSION/scripts/install/install-deps.sh -check -without-java -without-python -with-r
In your case (when you have R v4 already installed) you will need to delete R globally (apt-get remove <R_package> or something like that), remove directory <DATA_DIR>/R.lib (the DSS-specific R package library), install R v3.6, re-run ./bin/dssadmin install-R-integration script.
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Good news @webzest
! DSS can now use R 4 with Dataiku 11.1. In order to use R 4, you need to run the R integration procedure with “R” in the PATH pointing to R 4. All code environments then need to be rebuilt. Cloud Stacks setups are still on R 3.6, and will switch to R 4 in DSS 12.More information about Dataiku 11.1 can be found here: Brand New Features for Modelers, ML Engineers, and Analysts in Dataiku 11.1 and in the release notes.
Answers
-
Hi,
Thank you for the link. R is now activated as a kernel in the jupyter notebook, however, the dataiku library is now broken and I do not know how to get it reinstalled
Error: package or namespace load failed for ‘dataiku’: package ‘dataiku’ was installed before R 4.0.0: please re-install it Traceback: 1. library(dataiku) 2. tryCatch({ . attr(package, "LibPath") <- which.lib.loc . ns <- loadNamespace(package, lib.loc) . env <- attachNamespace(ns, pos = pos, deps, exclude, include.only) . }, error = function(e) { . P <- if (!is.null(cc <- conditionCall(e))) . paste(" in", deparse(cc)[1L]) . else "" . msg <- gettextf("package or namespace load failed for %s%s:\n %s", . sQuote(package), P, conditionMessage(e)) . if (logical.return) . message(paste("Error:", msg), domain = NA) . else stop(msg, call. = FALSE, domain = NA) . }) 3. tryCatchList(expr, classes, parentenv, handlers) 4. tryCatchOne(expr, names, parentenv, handlers[[1L]]) 5. value[[3L]](cond) 6. stop(msg, call. = FALSE, domain = NA)
-
Yes, I was able to follow those steps and resolve the R version issue.