HOw to get absolute path of a file in a Managed folder

Options
shoareau
shoareau Partner, Dataiku DSS Core Designer, Registered Posts: 8 Partner

Hello,

I have a project where I created a Managed folder where i succeeded to load a text file and binary file in it.

To reach the content of my folder , i use:

REPO = dataiku.Folder("quNauGux")
REPO = REPO .get_info()

list_files = REPO .list_paths_in_partition()

It shows me 2 files as expected

/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis',
'/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/Test_file.txt'

To read the content of the text file, i use

with REPO.get_download_stream(list_files[1]) as stream:
data = stream.readline()
print("First line of myfile is: {}".format(data))

But for the binary file, i want to use it by calling a specific python function (let s say decode_function).

Thus i want to call something like

decode_function(list_files[0])

But i fails beasue of

- OSError: '/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis' is not an existing regular file

There is an issue in the path file.

How can i get the absolute path of the file inside DDS ?

thank you

Tagged:

Answers

  • Keiji
    Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    Options

    Hello @shoareau
    ,

    Thank you for posting the question on Community.

    > How can i get the absolute path of the file inside DDS ?

    If your managed folder is stored on the local filesystem of the DSS host, you can use the Folder.file_path(filename) function to extract the filesystem path for a given file within a folder. Here is sample code.

    import dataikuFOLDER_ID = 'VdrA1ZMC'REPO = dataiku.Folder(FOLDER_ID)list_files = REPO .list_paths_in_partition()REPO.file_path(list_files[0])

    Screen Shot 2022-03-27 at 12.06.33.png

    Please note that this Folder.file_path(filename) function is available only if your managed folder is stored on the local filesystem of the DSS host. If your managed folder is stored on other places such as HDFS and S3, this function cannot be used and you will need to call the Folder.get_download_stream(path) function to retrieve the file contents from the folder as follows.

    import dataikuFOLDER_ID = 'RoQHOCas'REPO = dataiku.Folder(FOLDER_ID)list_files = REPO .list_paths_in_partition()with REPO.get_download_stream(list_files[0]) as stream:data = stream.read()# You can handle / manipulate the file contents (data) here.

    I hope this would help. Please let us know if you have any further questions.

    Sincerely,
    Keiji, Dataiku Technical Support

Setup Info
    Tags
      Help me…