R in DataIku Tutorial

Options
webzest
webzest Registered Posts: 12 ✭✭✭✭

Hello,

I am working through the DataIku R Tutorial and ran into some issues. I Installed R 3.6 on my UBUTU 20 VM, utilizing the following directions to make sure I was installing the correct supported version of R: https://linuxconfig.org/how-to-install-r-on-ubuntu-20-04

When I am running through the Tutorial, it is failing from the first R code section that is generating the orders_by_customers variable or csv output.

I noticed a RFSONIO package error in the report, but it is not available from the deb repository that I am using. (E: Unable to locate package r-cran-rjsonio)

Here is the error report from the Tutorial:

[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - ----------------------------------------[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS startup: jek version:8.0.0[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - DSS home: /home/johnny/Documents/dataiku/dataml[07:25:23] [INFO] [dku] running compute_orders_by_customer_NP - OS: Linux 5.4.0-42-generic amd64 - Java: Ubuntu 11.0.8[07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Allocated a slot for this activity![07:25:23] [INFO] [dku.flow.jobrunner] running compute_orders_by_customer_NP - Run activity[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default pre-activity lifecycle hook[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checking if sources are ready[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Checked source readiness DKU_TUTORIAL_R.orders -> true[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Computing hashes to propagate BEFORE activity[07:25:23] [DEBUG] [dku.dataset.hash] running compute_orders_by_customer_NP - Readiness cache miss for dataset__admin__DKU_TUTORIAL_R.orders__NP[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511[07:25:23] [INFO] [dku.dataset.hash] running compute_orders_by_customer_NP - Caching readiness for dataset__admin__DKU_TUTORIAL_R.orders__NP s=READY h=twgxBzU/4e4pwAGKukcQyA[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recorded 1 hashes before activity run[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Building recipe runner of type[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Filling source sizes[07:25:23] [INFO] [dku.datasets.file] running compute_orders_by_customer_NP - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}[07:25:23] [INFO] [dku.datasets.ftplike] running compute_orders_by_customer_NP - Enumerating Filesystem dataset prefix=[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumerating local filesystem prefix=/[07:25:23] [DEBUG] [dku.fs.local] running compute_orders_by_customer_NP - Enumeration done nb_paths=1 size=1516511[07:25:23] [DEBUG] [dku.job.activity] running compute_orders_by_customer_NP - Done filling source sizes[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Recipe runner built, will use 1 thread(s)[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Starting execution thread: com.dataiku.dip.dataflow.exec.r.RRecipeRunner@53af4105[07:25:23] [DEBUG] [dku.flow.activity] running compute_orders_by_customer_NP - Execution threads started, waiting for activity end[07:25:23] [INFO] [dku.flow.activity] - Run thread for activity compute_orders_by_customer_NP starting[07:25:23] [INFO] [dku.venv.selector] - Select code env lang=R projectSelection={"mode":"INHERIT","preventOverride":false} globalDefault=null[07:25:23] [INFO] [dku.flow.R] - Starting execution of user's R code[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"connection":"filesystem_managed","path":"DKU_TUTORIAL_R/orders_by_customer","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}[07:25:23] [WARN] [dku.fs.local] - File does not exist: /home/johnny/Documents/dataiku/dataml/managed_datasets/DKU_TUTORIAL_R/orders_by_customer[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}[07:25:23] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"/home/johnny/Documents/dataiku/dataml/uploads/DKU_TUTORIAL_R/datasets/orders","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}[07:25:23] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM DKU_TUTORIAL_R is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]}[07:25:23] [INFO] [dku.code.projectLibs] - chunkFolder is /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R[07:25:23] [INFO] [xxx] - RSRC PATH: ["/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/localconfig/projects/DKU_TUTORIAL_R/lib/R"][07:25:23] [INFO] [dku.recipes.code.base] - Writing dku-exec-env for local execution in /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/remote-run-env-def.json[07:25:23] [INFO] [dku.code.envs.resolution] - Executing R activity in builtin env[07:25:23] [INFO] [dku.recipes.code.r] - Execute activity command: ["/usr/bin/R","--quiet","--no-save","--args","/home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"][07:25:23] [INFO] [dku.recipes.code.base] - Run command insecurely, from user johnny[07:25:23] [INFO] [dku.security.process] - Starting process (regular)[07:25:23] [INFO] [dku.security.process] - Process started with pid=45571[07:25:23] [INFO] [dku.processes.cgroups] - Will use cgroups [][07:25:23] [INFO] [dku.processes.cgroups] - Applying rules to used cgroups: [][07:25:23] [INFO] [dku.recipes.code.base] - Process reads from nothing[07:25:23] [INFO] [dku.resourceusage] - Reporting start of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting start of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"cpuCurrent":0.0}}[07:25:23] [INFO] [dku.utils]  - > options(echo=T)[07:25:23] [INFO] [dku.utils]  - > args <- commandArgs(TRUE);[07:25:23] [INFO] [dku.utils]  - > print (paste("Executing R script: ", args));[07:25:23] [INFO] [dku.utils]  - [1] "Executing R script:  /home/johnny/Documents/dataiku/dataml/jobs/DKU_TUTORIAL_R/Build_orders_by_customer_2020-07-29T11-25-17.666/compute_orders_by_customer_NP/r-recipe/routp0O7llPBWwKq/R-recipe.R"[07:25:23] [INFO] [dku.utils]  - >[07:25:23] [INFO] [dku.utils]  - > runsRemotely = FALSE;[07:25:23] [INFO] [dku.utils]  - > jobCwd = NULL;[07:25:23] [INFO] [dku.utils]  - > dkuExecEnv = NULL;[07:25:23] [INFO] [dku.utils]  - > scriptFile = args[1][07:25:23] [INFO] [dku.utils]  - > if (file.exists("remote-run-env-def.json")) {[07:25:23] [INFO] [dku.utils]  - +     library("RJSONIO");[07:25:23] [INFO] [dku.utils]  - +     dkuExecEnv = fromJSON(file("remote-run-env-def.json"))[07:25:23] [INFO] [dku.utils]  - +     runsRemotely = dkuExecEnv$runsRemotely[07:25:23] [INFO] [dku.utils]  - + }[07:25:23] [INFO] [dku.utils]  - Error in library("RJSONIO") : there is no package called ‘RJSONIO’[07:25:23] [INFO] [dku.utils]  - Execution halted[07:25:23] [WARN] [dku.resource]  - stat file for pid 45571 does not exist. Process died?[07:25:23] [INFO] [dku.usage.computeresource.jek]  - Reporting update of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}[07:25:23] [DEBUG] [dku.resource]  - Process stats for pid 45571: {"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}[07:25:23] [WARN] [dku.resource] - stat file for pid 45571 does not exist. Process died?[07:25:23] [INFO] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}[07:25:23] [INFO] [dku.usage.computeresource.jek] - Reporting completion of resource usage: {"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_R","jobId":"Build_orders_by_customer_2020-07-29T11-25-17.666","activityId":"compute_orders_by_customer_NP","activityType":"recipe","recipeType":"r","recipeName":"compute_orders_by_customer"},"type":"LOCAL_PROCESS","id":"Ch9ThTfMLubpTvYr","startTime":1596021923631,"endTime":1596021923679,"localProcess":{"pid":45571,"commandName":"/usr/bin/R","cpuCurrent":0.0,"vmRSSTotalMBS":0}}[07:25:23] [INFO] [dku.flow.activity] - Run thread failed for activity compute_orders_by_customer_NPcom.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)	at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)	at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - activity is finished[07:25:23] [ERROR] [dku.flow.activity] running compute_orders_by_customer_NP - Activity failedcom.dataiku.dip.exceptions.ProcessDiedException: The R process failed (exit code: 1). More info might be available in the logs.	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:189)	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)	at com.dataiku.dip.recipes.code.r.AbstractRRecipeRunner.executeScript(AbstractRRecipeRunner.java:39)	at com.dataiku.dip.dataflow.exec.r.RRecipeRunner.run(RRecipeRunner.java:57)	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Executing default post-activity lifecycle hook[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Removing samples for DKU_TUTORIAL_R.orders_by_customer[07:25:23] [INFO] [dku.flow.activity] running compute_orders_by_customer_NP - Done post-activity tasks

Best Answers

Answers

  • webzest
    webzest Registered Posts: 12 ✭✭✭✭
    Options

    Hi,

    Thank you for the link. R is now activated as a kernel in the jupyter notebook, however, the dataiku library is now broken and I do not know how to get it reinstalled

    Error: package or namespace load failed for ‘dataiku’:package ‘dataiku’ was installed before R 4.0.0: please re-install itTraceback:1. library(dataiku)2. tryCatch({.     attr(package, "LibPath") <- which.lib.loc.     ns <- loadNamespace(package, lib.loc).     env <- attachNamespace(ns, pos = pos, deps, exclude, include.only). }, error = function(e) {.     P <- if (!is.null(cc <- conditionCall(e))).         paste(" in", deparse(cc)[1L]).     else "".     msg <- gettextf("package or namespace load failed for %s%s:\n %s",.         sQuote(package), P, conditionMessage(e)).     if (logical.return).         message(paste("Error:", msg), domain = NA).     else stop(msg, call. = FALSE, domain = NA). })3. tryCatchList(expr, classes, parentenv, handlers)4. tryCatchOne(expr, names, parentenv, handlers[[1L]])5. value[[3L]](cond)6. stop(msg, call. = FALSE, domain = NA)
  • webzest
    webzest Registered Posts: 12 ✭✭✭✭
    Options

    Yes, I was able to follow those steps and resolve the R version issue.

Setup Info
    Tags
      Help me…