Can we download Dataiku datasets into our local machine programmatically?
I want to programmatically download the output dataset of the flow into my local machine. How can I do this?
I tried following some suggestions from Chat GPT, it doesn't work. I get max retries reached an error. Can anyone help me with this?
#Import libraries
import dataiku
import requests
import os
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry#Replace with your project key and managed folder ID
project_key = 'PROJECT_KEY'
managed_folder_id = '5kdvyus'#Replace with your Dataiku URL and API key
dataiku_url = 'https://dataiku.com # Assuming HTTPS and port 443 is default
api_key = 'ABCDEFGH'#Initialize the Dataiku API client
client = dataiku.api_client()
#Get the project and managed folder
project = client.get_project(project_key)
managed_folder = project.get_managed_folder(managed_folder_id)#List the contents of the managed folder
contents = managed_folder.list_contents()
if not contents['items']:
raise Exception("No files found in the managed folder.")#Download the first file in the managed folder
file_path = contents['items'][0]['path']
download_url = f"{dataiku_url}/dip/api/managedfolder/{managed_folder_id}/contents/{file_path}"#Configure retries for the requests session
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[500, 502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))#Download the file
try:
response = session.get(download_url, headers={'Authorization': f'Bearer {api_key}'})
response.raise_for_status()
with open('filtered_data.xlsx', 'wb') as f:
f.write(response.content)
print("File downloaded successfully")
except requests.exceptions.RequestException as e:
print(f"Failed to download the file: {e}")
Operating system used: WIndows
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
That is a load of garbage code you got from ChatGPT I am afraid. Where will this Python code be running? Inside Dataiku or outside Dataiku?
-
Can you help?
I have been using Dataiku API, so getting details from managed folders are alright. I want to find a way to download that file into my local machine.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
I can but you have not answered my question. Where is the Python code to download the dataset is going to run? This will determine which Dataiku API package you need to use. Are you going to run inside a Dataiku recipe/Dataiku Notebook or somewhere else like another Python runtime environment on your machine?
-
I want to use this code either in a Python recipe or as a Scenario step "Execute code" may be.
The use case here is that we run a flow/scenario on an Application Designer form after clicking a "Run" button, we can add a "Download" button too, so users can click and download the dataset in the local machine, which works well. The ask here is, that both flow/scenario execution and dataset download have to be done with one click. So I am thinking of adding a program to download the dataset at the end of the scenario as a last step.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
”Application Designer form”. What exactly is that?
-
Dataiku Applications**
Where one can build user forms without code. We can ignore the Application as we have to perform the download using Python code. I want to download the dataset on a local machine either using a Python recipe or a scenario step "Execute Code".
-Thanks
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
ok the code for download should be easy but you can’t really download from a scenario. A scenario executes unattended. How is the scenario supposed to download a file to a user’s machine? I am not sure I understand how you want this to work.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
Neither a recipe nor a scenario can trigger a download because they are non interactive ways of running code. You will need to use something like a Dataiku web app which is an interactive framework:
https://knowledge.dataiku.com/latest/data-viz/webapps/code-download-from-webapp.html
-
I tried Webapp at first and it worked, but the ask is to use Applications and download the file.
If you see the image below, users have to click on "Run" after adding a phone number (user input for filter). A scenario is executed on the click, once the scenario is successful, I can click on the "Download" button to get the data in my local machine.
The ask was to find a way for all these steps could be done on a single click, so only one button. I thought of this approach, by adding a download program (Python code) at the end of the scenario, we don't need the "Download" button.
But as I see, scenarios and recipes are not interactive, I understand that won't work.
Thanks