Call Julia lang in Notebooks inside Dataiku ?

Options
Florent
Florent Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer Posts: 2 ✭✭✭✭
I have Julia installed on my CentOS 7 server with Dataiku. Is there a smart way to open notebooks with IJulia inside Dataiku ?



Regards,

Florent
Tagged:

Answers

  • kldehoff
    kldehoff Registered Posts: 3 ✭✭✭✭
    edited July 17
    Options

    Perhaps a bit late on this post, but there is a somewhat hack-ish way to load the Julia kernel inside Dataiku. Please note that I have only accomplished this on a single-user setup, although it is likely to work similarly for a multi-user setup.

    1. Make sure Julia is installed and on the current (Dataiku) user path.
    2. Create and run a new shell recipe (output to an empty folder if required) with the following contents:
      julia -E 'using Pkg; Pkg.add("IJulia")'
      This will install the IJulia package and add the IJulia kernel to Dataiku's list of kernels when run
    3. Start a new Jupyter notebook (any kernel will do)
    4. Change the kernel to "Julia x.y.z" (x.y.z is the installed version)

    In addition, it is possible using the PyCall and DataFrames packages to load Dataiku data into Julia. It may be possible to save it back in a similar fashion, but I have not attempted this yet.

    For loading Dataiku datasets, the following template should work, but requires the PyCall.jl, Pandas.jl, and DataFrames.jl packages to be installed.

    #this forces PyCall to reference the built-in python environment for Dataiku
    ENV["PYTHON"] = "/data/dataiku/bin/python" Pkg.build("PyCall")

    using PyCall, Pandas, DataFrames

    #load Dataiku Python libraries
    dataiku = pyimport("dataiku")
    pd = pyimport("pandas")

    #load the dataset into both a Pandas dataframe and a Julia Dataframes dataframe
    mydataset = dataiku.Dataset("test")
    pd_df = Pandas.DataFrame(mydataset.get_dataframe())
    jl_df = DataFrames.DataFrame(df)

    #test the load
    Pandas.head(pd_df)
    DataFrames.head(jl_df)
  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
    Options

    It's never too late! Thank you for your contribution.

  • dshurgatwa
    dshurgatwa Registered Posts: 1 ✭✭✭
    Options

    How to use After the installation, it will be possible to create and execute Julia recipes the same way you would use any other code recipes. A Julia kernel also becomes available for Jupyter notebooks. Inside recipes and notebooks, use the package Dataiku.jl to interact with DSS. This package is a wrapper around the DSS Public API and provides functions to read and write datasets and folders in DSS easily. See the documentation on the package’s README.md Code environments For now, it is not possible to have multiple code environments in Julia. Therefore, all the julia recipes and notebooks will use the same environment that is located at $DSS_HOME/code-envs/julia. To install or remove packages, this environment has to be managed manually using the home theater julia’s built-in package manager, there are 2 ways to do that: By using Pkg inside a Jupyter notebook in DSS By running julia with the environment variable JULIA_DEPOT_PATH=$DSS_HOME/code-envs/julia

Setup Info
    Tags
      Help me…