Uploading a file to a folder using the Python API
Hi, does anyone have a code snippet of how to upload a file to a Dataiku folder using the external Python API? Thanks!
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
After some Googling and support help I figured out how to do this using the Python API package (dataikuapi) which is the recommended package to use when using it outside DSS. There seems to be an undocumented method (get_managed_folder) which is not shown in the API client definition page.
import dataikuapi
# Set Dataiku URL and API Key
host = "https://your_dss_URL"
apiKey = "paste your API here"
# Create API client
client = dataikuapi.DSSClient(host, apiKey)
# Ignore SSL checks as these may fail without access to root CA certs
client._session.verify = False
# Get a handle to the Dataiku project, must use Project Key, take it from Project Home URL, must be all in uppercase
project = client.get_project("PROJECT_KEY")
# Get a handle to the managed folder you want to upload a file to, must use Folder ID, take it from URL when browsing the folder in Dataiku. Case sensitive!
managedfolder = project.get_managed_folder("folder_id")
# Upload a local file to the managed folder
with open("C:\full_path_to\same_file.csv", "r") as file:
managedfolder.put_file('same_file.csv', file)
Answers
-
Hi @Turribeach
,Here you go
import dataiku
folder = dataiku.Folder("FOLDER_ID")
folder.upload_file("/uploaded_file.csv", local_file_path)Please also refer to managed folder documentation for more info. There are other methods like upload_stream and upload_data -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
I am getting this exception:
Exception: Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)
Isn't the Dataiku package supposed to be used only from DSS and I should use the dataikuapi from outside DSS?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
Solved it with this.
import os
os.environ["DKU_CURRENT_PROJECT_KEY"] = "PROJECT_KEY" -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
OK get_managed_folder is documented here. Dataiku has broken all the API classes in different pages which I think makes it harder for someone new to understand the API. I thought the https://doc.dataiku.com/dss/latest/python-api/client.html page was the whole API documentation. It is also very confusing to have both APIs (dataiku and dataikuapi) in the same page. In addition to this the different APIs implement things in different ways. For dataiku you use dataiku.Folder("FOLDER_ID") to get a folder handle whereas in the dataikuapi you first need to get a handle to a project and then do project.get_managed_folder('FOLDER_ID'). Finally there are no examples on how to use get_managed_folder.
-
dromero Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 4 ✭✭✭✭
Agree with @Turribeach
. It's very confusing for me too to use the REST API in Dataiku. And no concrete examples given at all.