You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

HOw to get absolute path of a file in a Managed folder

shoareau
Level 1
Level 1
HOw to get absolute path of a file in a Managed folder

Hello,

I have a project where I created a Managed folder where i succeeded to load a text file and binary file in it.

To reach the content of my folder , i use:

REPO = dataiku.Folder("quNauGux")
REPO = REPO .get_info()

list_files = REPO .list_paths_in_partition()

It shows me 2 files  as expected

/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis',
'/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/Test_file.txt'

To read the content of the text file, i use 

with REPO.get_download_stream(list_files[1]) as stream:
data = stream.readline()
print("First line of myfile is: {}".format(data))

But for the binary file, i want to use it by calling a specific python function (let s say decode_function).

Thus i want to call something like 

decode_function(list_files[0]) 

But i fails beasue of

- OSError: '/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis' is not an existing regular file

There is an issue in the path file.

How can i get the absolute path of the file inside DDS ?

 

thank you 

 

0 Kudos
1 Reply
KeijiY
Dataiker
Dataiker

Hello @shoareau ,

Thank you for posting the question on Community.

> How can i get the absolute path of the file inside DDS ?

If your managed folder is stored on the local filesystem of the DSS host, you can use the Folder.file_path(filename) function to extract the filesystem path for a given file within a folder. Here is sample code.

import dataiku

FOLDER_ID = 'VdrA1ZMC'

REPO = dataiku.Folder(FOLDER_ID)
list_files = REPO .list_paths_in_partition()
REPO.file_path(list_files[0])

 

Screen Shot 2022-03-27 at 12.06.33.png

Please note that this Folder.file_path(filename) function is available only if your managed folder is stored on the local filesystem of the DSS host. If your managed folder is stored on other places such as HDFS and S3, this function cannot be used and you will need to call the Folder.get_download_stream(path) function to retrieve the file contents from the folder as follows.

import dataiku

FOLDER_ID = 'RoQHOCas'

REPO = dataiku.Folder(FOLDER_ID)
list_files = REPO .list_paths_in_partition()
with REPO.get_download_stream(list_files[0]) as stream:
    data = stream.read()
    # You can handle / manipulate the file contents (data) here.

 

I hope this would help. Please let us know if you have any further questions.

Sincerely,
Keiji, Dataiku Technical Support

0 Kudos