Uploading a file to a folder using the Python API

Turribeach
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron

Hi, does anyone have a code snippet of how to upload a file to a Dataiku folder using the external Python API? Thanks!

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron
    edited July 2024 Answer ✓

    After some Googling and support help I figured out how to do this using the Python API package (dataikuapi) which is the recommended package to use when using it outside DSS. There seems to be an undocumented method (get_managed_folder) which is not shown in the API client definition page.

    import dataikuapi

    # Set Dataiku URL and API Key
    host = "https://your_dss_URL"
    apiKey = "paste your API here"

    # Create API client
    client = dataikuapi.DSSClient(host, apiKey)

    # Ignore SSL checks as these may fail without access to root CA certs
    client._session.verify = False

    # Get a handle to the Dataiku project, must use Project Key, take it from Project Home URL, must be all in uppercase
    project = client.get_project("PROJECT_KEY")

    # Get a handle to the managed folder you want to upload a file to, must use Folder ID, take it from URL when browsing the folder in Dataiku. Case sensitive!
    managedfolder = project.get_managed_folder("folder_id")

    # Upload a local file to the managed folder
    with open("C:\full_path_to\same_file.csv", "r") as file:
    managedfolder.put_file('same_file.csv', file)

Answers

  • Andrey
    Andrey Dataiker Alumni Posts: 119 ✭✭✭✭✭✭✭
    edited July 2024

    Hi @Turribeach
    ,

    Here you go

    import dataiku

    folder = dataiku.Folder("FOLDER_ID")
    folder.upload_file("/uploaded_file.csv", local_file_path)
    Please also refer to managed folder documentation for more info. There are other methods like upload_stream and upload_data
  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron
    edited July 2024

    I am getting this exception:

    Exception: Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)

    Isn't the Dataiku package supposed to be used only from DSS and I should use the dataikuapi from outside DSS?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron

    Solved it with this.

    import os
    os.environ["DKU_CURRENT_PROJECT_KEY"] = "PROJECT_KEY"

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron

    OK get_managed_folder is documented here. Dataiku has broken all the API classes in different pages which I think makes it harder for someone new to understand the API. I thought the https://doc.dataiku.com/dss/latest/python-api/client.html page was the whole API documentation. It is also very confusing to have both APIs (dataiku and dataikuapi) in the same page. In addition to this the different APIs implement things in different ways. For dataiku you use dataiku.Folder("FOLDER_ID") to get a folder handle whereas in the dataikuapi you first need to get a handle to a project and then do project.get_managed_folder('FOLDER_ID'). Finally there are no examples on how to use get_managed_folder.

  • dromero
    dromero Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 4 ✭✭✭✭

    Agree with @Turribeach
    . It's very confusing for me too to use the REST API in Dataiku. And no concrete examples given at all.

Setup Info
    Tags
      Help me…