HOw to get absolute path of a file in a Managed folder
Hello,
I have a project where I created a Managed folder where i succeeded to load a text file and binary file in it.
To reach the content of my folder , i use:
REPO = dataiku.Folder("quNauGux")
REPO = REPO .get_info()
list_files = REPO .list_paths_in_partition()
It shows me 2 files as expected
/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis',
'/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/Test_file.txt'
To read the content of the text file, i use
with REPO.get_download_stream(list_files[1]) as stream:
data = stream.readline()
print("First line of myfile is: {}".format(data))
But for the binary file, i want to use it by calling a specific python function (let s say decode_function).
Thus i want to call something like
decode_function(list_files[0])
But i fails beasue of
- OSError: '/dQsWyctJnkGLPfYLqCoZcrQJaDeKvPAz/binary.dlis' is not an existing regular file
There is an issue in the path file.
How can i get the absolute path of the file inside DDS ?
thank you
Answers
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @shoareau
,Thank you for posting the question on Community.
> How can i get the absolute path of the file inside DDS ?
If your managed folder is stored on the local filesystem of the DSS host, you can use the Folder.file_path(filename) function to extract the filesystem path for a given file within a folder. Here is sample code.
import dataiku FOLDER_ID = 'VdrA1ZMC' REPO = dataiku.Folder(FOLDER_ID) list_files = REPO .list_paths_in_partition() REPO.file_path(list_files[0])
Please note that this Folder.file_path(filename) function is available only if your managed folder is stored on the local filesystem of the DSS host. If your managed folder is stored on other places such as HDFS and S3, this function cannot be used and you will need to call the Folder.get_download_stream(path) function to retrieve the file contents from the folder as follows.
import dataiku FOLDER_ID = 'RoQHOCas' REPO = dataiku.Folder(FOLDER_ID) list_files = REPO .list_paths_in_partition() with REPO.get_download_stream(list_files[0]) as stream: data = stream.read() # You can handle / manipulate the file contents (data) here.
I hope this would help. Please let us know if you have any further questions.
Sincerely,
Keiji, Dataiku Technical Support