How to upload folder into Dataiku library

Options
epsi95
epsi95 Dataiku DSS Core Concepts, Registered Posts: 15 ✭✭✭✭

Capture.PNG

I am using the Dataiku library to add custom packages.

I have a few constraints--

  • I can't use GIT
  • My packages and sub packages are very long like

package

|--sub package 1

|--subpackage 2

|--sub package 3

as so on

Currently, I am manually creating folders and uploading files. It is a very tedious job, since every time I change some files I need to upload all the files since I don't remember which files are getting changed.

Questions:

  1. Why can't we upload Folder? It is very odd to me, why Dataiku does not allow uploading a folder in Library. Can anyone help me with this?
  2. Can I write some python script or FTP to upload the files and folder or create on Dataiku?
    Operating system used: Windows

Best Answer

  • VitaliyD
    VitaliyD Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    edited July 17 Answer ✓
    Options

    Hi,

    Unfortunately, as you mentioned it is not possible to upload a folder to the libraries at the moment. I see this feature request in out backlog, but we won't be able to provide a timeline of when it could be implemented.

    If your instance is not UIF enabled, you can try to write a script to do it manually using a python notebook, but please proceed with caution, it is never a good idea to modify files in the data_dir directory directly as if you break something it may lead to downtime (also make sure you have a backup) and for sure this won't be a recommended way of doing it.

    So you can reach the lib directory of the project like below and then add your folder there:

    import dataiku, json, os
    path = os.getcwd()
    project_name = dataiku.get_custom_variables()["projectKey"]
    dip_home = dataiku.get_custom_variables()["dip.home"]
    libraries_path = os.path.join(dip_home, 'config/projects/' + project_name + '/lib')
    print(libraries_path)
    print(os.listdir(libraries_path))

    Screenshot 2021-11-19 at 12.23.01.png

    Hope this helps.

    -Best.

Answers

  • epsi95
    epsi95 Dataiku DSS Core Concepts, Registered Posts: 15 ✭✭✭✭
    Options

    Hi, @VitaliyD
    Thanks for the reply. What I am trying to do is to upload files from my computer to Dataiku. I can access the Dataiku URL since my computer and Dataiku are on the same internet. I am using `dataikuapi` library, can you guide me on how to proceed with this library?

    ```python

    import dataikuapi

    # Set Dataiku URL and API Key
    host = "https://xxxx:xxxxxx"
    apiKey = "xxxxxx"

    # Create API client
    client = dataikuapi.DSSClient(host, apiKey)

    # Ignore SSL checks as these may fail without access to root CA certs
    client._session.verify = False

    project = client.get_project("xxxx")

    ```

  • VitaliyD
    VitaliyD Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    edited July 17
    Options

    Hi, How to use Dataiku Api remotely you can learn from this guide. On high-level, the complete solution will look like this: using Ddataiku API remotely upload a zipped library to a DSS local filesystem managed from your local computer. Then on the DSS side, either write a macro (as a plugin developer) or a python notebook that copies the file from the managed folder to a project library directory and unzip it there utilising os and zipfile python packages( How to find the project library path I already mentioned earlier).

    To find a managed folder system path with the below code:

    folder = dataiku.Folder("folderID") # replace with managed folder id
    folder_path = folder.get_path()

    I hope this helps.

    -Best

  • epsi95
    epsi95 Dataiku DSS Core Concepts, Registered Posts: 15 ✭✭✭✭
    Options

    Hi @VitaliyD
    I am getting the following error, seems like I need to get permission for reading from and write to this specific directory. As of now successfully uploaded the zipped file to the file system folder and unzipped it. Once I get the permission I will override the folders in Library.

    ```python

    ---------------------------------------------------------------------------
    PermissionError Traceback (most recent call last)
    <ipython-input-5-7ebe77cfbc2d> in <module>()
    4 libraries_path = os.path.join(dip_home, 'config/projects/' + project_name + '/lib')
    5 print(libraries_path)
    ----> 6 print(os.listdir(libraries_path))

    PermissionError: [Errno 13] Permission denied: '/app/dataiku_design/dssdata-9.0.3/config/projects/xxxx/lib'

    ```

  • VitaliyD
    VitaliyD Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    Options

    Hi, this most likely means that you have User isolation enabled on your instance. In this case, you won't be able to modify any DSS/system directories with any DSS user used to run this python code without changing permissions manually(which defeats the idea of having UIF enabled in the first place. It is maybe acceptable for test instances, but for sure not recommended for production). You can check if UIF is enabled in DSS settings (Administration > Settings > Login (LDAP, SSO) & Security):

    Screenshot 2021-11-24 at 08.56.22.png

    -Best

  • epsi95
    epsi95 Dataiku DSS Core Concepts, Registered Posts: 15 ✭✭✭✭
    Options

    Yes User Isolation is enabled. I have asked the admin to give me access to read and write for that specific Library Path.

Setup Info
    Tags
      Help me…