Ready for Dataiku 9? Try out the Crash Course on new features! GET STARTED

Uploading a file to a folder using the Python API

Solved!
Turribeach
Level 3
Uploading a file to a folder using the Python API

Hi, does anyone have a code snippet of how to upload a file to a Dataiku folder using the external Python API? Thanks!

0 Kudos
1 Solution
Turribeach
Level 3
Author

After some Googling and support help I figured out how to do this using the Python API package (dataikuapi) which is the recommended package to use when using it outside DSS. There seems to be an undocumented method (get_managed_folder) which is not shown in the API client definition page.

import dataikuapi

# Set Dataiku URL and API Key
host = "https://your_dss_URL"
apiKey = "paste your API here"

# Create API client
client = dataikuapi.DSSClient(host, apiKey)

# Ignore SSL checks as these may fail without access to root CA certs
client._session.verify = False

# Get a handle to the Dataiku project, must use Project Key, take it from Project Home URL, must be all in uppercase
project = client.get_project("PROJECT_KEY")

# Get a handle to the managed folder you want to upload a file to, must use Folder ID, take it from URL when browsing the folder in Dataiku. Case sensitive!
managedfolder = project.get_managed_folder("folder_id")

# Upload a local file to the managed folder
with open("C:\full_path_to\same_file.csv", "r") as file:
managedfolder.put_file('same_file.csv', file)

 

View solution in original post

0 Kudos
5 Replies
Andrey
Dataiker
Dataiker

Hi @Turribeach ,

Here you go

import dataiku

folder = dataiku.Folder("FOLDER_ID")
folder.upload_file("/uploaded_file.csv", local_file_path)
Please also refer to managed folder documentation for more info. There are other methods like upload_stream and upload_data
Andrey Avtomonov
R&D Engineer @ Dataiku
0 Kudos
Turribeach
Level 3
Author

I am getting this exception:

Exception: Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)

Isn't the Dataiku package supposed to be used only from DSS and I should use the dataikuapi from outside DSS?

0 Kudos
Turribeach
Level 3
Author

Solved it with this.

import os
os.environ["DKU_CURRENT_PROJECT_KEY"] = "PROJECT_KEY"

Turribeach
Level 3
Author

After some Googling and support help I figured out how to do this using the Python API package (dataikuapi) which is the recommended package to use when using it outside DSS. There seems to be an undocumented method (get_managed_folder) which is not shown in the API client definition page.

import dataikuapi

# Set Dataiku URL and API Key
host = "https://your_dss_URL"
apiKey = "paste your API here"

# Create API client
client = dataikuapi.DSSClient(host, apiKey)

# Ignore SSL checks as these may fail without access to root CA certs
client._session.verify = False

# Get a handle to the Dataiku project, must use Project Key, take it from Project Home URL, must be all in uppercase
project = client.get_project("PROJECT_KEY")

# Get a handle to the managed folder you want to upload a file to, must use Folder ID, take it from URL when browsing the folder in Dataiku. Case sensitive!
managedfolder = project.get_managed_folder("folder_id")

# Upload a local file to the managed folder
with open("C:\full_path_to\same_file.csv", "r") as file:
managedfolder.put_file('same_file.csv', file)

 

View solution in original post

0 Kudos
Turribeach
Level 3
Author

OK get_managed_folder is documented here. Dataiku has broken all the API classes in different pages which I think makes it harder for someone new to understand the API. I thought the https://doc.dataiku.com/dss/latest/python-api/client.html page was the whole API documentation. It is also very confusing to have both APIs (dataiku and dataikuapi) in the same page. In addition to this the different APIs implement things in different ways. For dataiku you use dataiku.Folder("FOLDER_ID") to get a folder handle whereas in the dataikuapi you first need to get a handle to a project and then do project.get_managed_folder('FOLDER_ID'). Finally there are no examples on how to use get_managed_folder. 

A banner prompting to get Dataiku DSS