R in DataIku Tutorial

webzest
webzest Registered Posts: 12 ✭✭✭✭

Hello,

I am working through the DataIku R Tutorial and ran into some issues. I Installed R 3.6 on my UBUTU 20 VM, utilizing the following directions to make sure I was installing the correct supported version of R: https://linuxconfig.org/how-to-install-r-on-ubuntu-20-04

When I am running through the Tutorial, it is failing from the first R code section that is generating the orders_by_customers variable or csv output.

I noticed a RFSONIO package error in the report, but it is not available from the deb repository that I am using. (E: Unable to locate package r-cran-rjsonio)

Here is the error report from the Tutorial:

[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - ----------------------------------------
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS startup: jek version:8.0.0
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS home: /home/johnny/Documents/dataiku/dataml
[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - OS: Linux 5.4.0-42-generic amd64 - Java: Ubuntu 11.0.8
[07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Allocated a slot for this activity!
[07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Run activity
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default pre-activity lifecycle hook
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checking if sources are ready
[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checked source readiness DKU_TUTORIAL_R.orders -> true
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Computing hashes to propagate BEFORE activity
[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recorded 1 hashes before activity run
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Building recipe runner of type
[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Filling source sizes
[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/
[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511
[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Done filling source sizes
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recipe runner built, will use 1 thread(s)
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Starting execution thread: com.dataiku.dip.dataflow.exec.r.RRecipeRunner@53af4105
[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Execution threads started, waiting for activity end
[07:25:23] [INFO] [dku.flow.activity] - Run thread for activity compute_orders_by_customer_NP starting
[07:25:23] [INFO] [dku.venv.selector] - Select code env lang=R projectSelection={"mode":"INHERIT","preventOverride":false} globalDefault=null
[07:25:23] [INFO] [dku.flow.R] - Starting execution of user's R code
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"connection":"filesystem_managed","path":"DKU_TUTORIAL_R/orders_by_customer","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [WARN] [dku.fs.local] - File does not exist: /home/johnny/Documents/dataiku/dataml/managed_datasets/DKU_TUTORIAL_R/orders_by_customer
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[07:25:23] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM DKU_TUTORIAL_R is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]}
[07:25:23] [INFO] [dku.code.projectLibs] - chunkFolder is /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R
[07:25:23] [INFO] [xxx] - RSRC PATH: ["/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R"]
[07:25:23] [INFO] [dku.recipes.code.base] - Writing dku-exec-env for local execution in /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/remote-run-env-def.json
[07:25:23] [INFO] [dku.code.envs.resolution] - Executing R activity in builtin env
[07:25:23] [INFO] [dku.recipes.code.r] - Execute activity command: ["/usr/bin/R","--quiet","--no-save","--args","/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"]
[07:25:23] [INFO] [dku.recipes.code.base] - Run command insecurely, from user johnny
[07:25:23] [INFO] [dku.security.process] - Starting process (regular)
[07:25:23] [INFO] [dku.security.process] - Process started with pid=45571
[07:25:23] [INFO] [dku.processes.cgroups] - Will use cgroups []
[07:25:23] [INFO] [dku.processes.cgroups] - Applying rules to used cgroups: []
[07:25:23] [INFO] [dku.recipes.code.base] - Process reads from nothing
[07:25:23] [INFO] [dku.resourceusage] - Reporting start of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}
[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting start of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}
[07:25:23] [INFO] [dku.utils]  - > options(echo=T)
[07:25:23] [INFO] [dku.utils]  - > args <- commandArgs(TRUE);
[07:25:23] [INFO] [dku.utils]  - > print (paste("Executing R script: ", args));
[07:25:23] [INFO] [dku.utils]  - [1] "Executing R script:  /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"
[07:25:23] [INFO] [dku.utils]  - > 
[07:25:23] [INFO] [dku.utils]  - > runsRemotely = FALSE;
[07:25:23] [INFO] [dku.utils]  - > jobCwd = NULL;
[07:25:23] [INFO] [dku.utils]  - > dkuExecEnv = NULL;
[07:25:23] [INFO] [dku.utils]  - > scriptFile = args[1]
[07:25:23] [INFO] [dku.utils]  - > if (file.exists("remote-run-env-def.json")) {
[07:25:23] [INFO] [dku.utils]  - +     library("RJSONIO");
[07:25:23] [INFO] [dku.utils]  - +     dkuExecEnv = fromJSON(file("remote-run-env-def.json"))
[07:25:23] [INFO] [dku.utils]  - +     runsRemotely = dkuExecEnv$runsRemotely
[07:25:23] [INFO] [dku.utils]  - + }
[07:25:23] [INFO] [dku.utils]  - Error in library("RJSONIO") : there is no package called ‘RJSONIO’
[07:25:23] [INFO] [dku.utils]  - Execution halted
[07:25:23] [WARN] [dku.resource]  - stat file for pid 45571 does not exist. Process died?
[07:25:23] [INFO] [dku.usage.computeresource.jek]  - Reporting update of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [DEBUG] [dku.resource]  - Process stats for pid 45571: {"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}
[07:25:23] [WARN] [dku.resource] - stat file for pid 45571 does not exist. Process died?
[07:25:23] [INFO] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting completion of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"endTime":1596021923679,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}
[07:25:23] [INFO] [dku.flow.activity] - Run thread failed for activity compute_orders_by_customer_NP
com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.
   at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
   at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)
   at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
   at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)
   at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)
   at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - activity is finished
[07:25:23] [ERROR] [dku.flow.activity] running compute_orders_by_customer_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.
   at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
   at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)
   at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
   at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)
   at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)
   at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default post-activity lifecycle hook
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Removing samples for DKU_TUTORIAL_R.orders_by_customer
[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Done post-activity tasks

Best Answers

Answers

  • webzest
    webzest Registered Posts: 12 ✭✭✭✭
    edited July 17

    Hi,

    Thank you for the link. R is now activated as a kernel in the jupyter notebook, however, the dataiku library is now broken and I do not know how to get it reinstalled

    Error: package or namespace load failed for ‘dataiku’:
     package ‘dataiku’ was installed before R 4.0.0: please re-install it
    Traceback:
    
    1. library(dataiku)
    2. tryCatch({
     .     attr(package, "LibPath") <- which.lib.loc
     .     ns <- loadNamespace(package, lib.loc)
     .     env <- attachNamespace(ns, pos = pos, deps, exclude, include.only)
     . }, error = function(e) {
     .     P <- if (!is.null(cc <- conditionCall(e))) 
     .         paste(" in", deparse(cc)[1L])
     .     else ""
     .     msg <- gettextf("package or namespace load failed for %s%s:\n %s", 
     .         sQuote(package), P, conditionMessage(e))
     .     if (logical.return) 
     .         message(paste("Error:", msg), domain = NA)
     .     else stop(msg, call. = FALSE, domain = NA)
     . })
    3. tryCatchList(expr, classes, parentenv, handlers)
    4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
    5. value[[3L]](cond)
    6. stop(msg, call. = FALSE, domain = NA)
  • webzest
    webzest Registered Posts: 12 ✭✭✭✭

    Yes, I was able to follow those steps and resolve the R version issue.

Setup Info
    Tags
      Help me…