Can we download Dataiku datasets into our local machine programmatically?

Sajid
Sajid Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 17 Partner

I want to programmatically download the output dataset of the flow into my local machine. How can I do this?

I tried following some suggestions from Chat GPT, it doesn't work. I get max retries reached an error. Can anyone help me with this?

#Import libraries
import dataiku
import requests
import os
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

#Replace with your project key and managed folder ID
project_key = 'PROJECT_KEY'
managed_folder_id = '5kdvyus'

#Replace with your Dataiku URL and API key
dataiku_url = 'https://dataiku.com # Assuming HTTPS and port 443 is default
api_key = 'ABCDEFGH'

#Initialize the Dataiku API client
client = dataiku.api_client()


#Get the project and managed folder
project = client.get_project(project_key)
managed_folder = project.get_managed_folder(managed_folder_id)

#List the contents of the managed folder
contents = managed_folder.list_contents()
if not contents['items']:
raise Exception("No files found in the managed folder.")

#Download the first file in the managed folder
file_path = contents['items'][0]['path']
download_url = f"{dataiku_url}/dip/api/managedfolder/{managed_folder_id}/contents/{file_path}"

#Configure retries for the requests session
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[500, 502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))

#Download the file
try:
response = session.get(download_url, headers={'Authorization': f'Bearer {api_key}'})
response.raise_for_status()
with open('filtered_data.xlsx', 'wb') as f:
f.write(response.content)
print("File downloaded successfully")
except requests.exceptions.RequestException as e:
print(f"Failed to download the file: {e}")

Operating system used: WIndows

Answers

Setup Info
    Tags
      Help me…