Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 25, 2022 4:25PM
Likes: 0
Replies: 1
Hello,
I have a project where I created a Managed folder where i succeeded to load a text file and binary file in it.
To reach the content of my folder , i use:
REPO = dataiku.Folder("quNauGux")
REPO = REPO .get_info()
list_files = REPO .list_paths_in_partition()
It shows me 2 files as expected
/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis',
'/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/Test_file.txt'
To read the content of the text file, i use
with REPO.get_download_stream(list_files[1]) as stream:
data = stream.readline()
print("First line of myfile is: {}".format(data))
But for the binary file, i want to use it by calling a specific python function (let s say decode_function).
Thus i want to call something like
decode_function(list_files[0])
But i fails beasue of
- OSError: '/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis' is not an existing regular file
There is an issue in the path file.
How can i get the absolute path of the file inside DDS ?
thank you
Hello @shoareau
,
Thank you for posting the question on Community.
> How can i get the absolute path of the file inside DDS ?
If your managed folder is stored on the local filesystem of the DSS host, you can use the Folder.file_path(filename) function to extract the filesystem path for a given file within a folder. Here is sample code.
import dataiku FOLDER_ID = 'VdrA1ZMC' REPO = dataiku.Folder(FOLDER_ID) list_files = REPO .list_paths_in_partition() REPO.file_path(list_files[0])
Please note that this Folder.file_path(filename) function is available only if your managed folder is stored on the local filesystem of the DSS host. If your managed folder is stored on other places such as HDFS and S3, this function cannot be used and you will need to call the Folder.get_download_stream(path) function to retrieve the file contents from the folder as follows.
import dataiku FOLDER_ID = 'RoQHOCas' REPO = dataiku.Folder(FOLDER_ID) list_files = REPO .list_paths_in_partition() with REPO.get_download_stream(list_files[0]) as stream: data = stream.read() # You can handle / manipulate the file contents (data) here.
I hope this would help. Please let us know if you have any further questions.
Sincerely,
Keiji, Dataiku Technical Support