Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello community,
i have a usecase where i want to upload a file to a DSS managed folder via the python api. Now the question came up if there is a possibility to get the checksum of the file in the managed folder via the api? I want to verify after the upload that the file in DSS is indeed still the same file that i uploaded.
Is it possible to do this via API or would I have to use ssh to connect to the server directly? Any other ideas to achieve this?
Thank you for any input!
@stanjer ,
I am working with files on a locally mounted volume. Here is the script that I use to md5 from a Macintosh computer. Other OS variants may need slightly different scripts.
The results of this script is setup to:
You could put in an echo statement to output a header.
MOUNTVOLUME="test"
find /Volumes/$MOUNTVOLUME -type f -exec md5 {} ';'
I then use a visual prepare recipe with a few steps to extract the useful data.
Note that this method would only work with some kind of locally mounted file system.
This method is also considerably faster than many methods I've tried. I was able to do ~450,000 files in about 8-9 hours on a slowish connection.
Welcome to the Dataiku Community.
You might find the following threads a bit helpful.
On a project I ran in a few years in the past I did find that doing checksums via the Shell Recipe to be several times faster than other methods. It will depend on the size of your file(s) and your need to use Dataiku Connections in your use case.
Hi Tom,
thank you for the reply. Sorry that it took me a little longer to get back. Indeed your comment was very helpful for us and although we put that topic in the backlog for now, the shell recipe is an interesting solution i didn't have on my radar up until now.
Kind regards
Jan
@stanjer ,
I am working with files on a locally mounted volume. Here is the script that I use to md5 from a Macintosh computer. Other OS variants may need slightly different scripts.
The results of this script is setup to:
You could put in an echo statement to output a header.
MOUNTVOLUME="test"
find /Volumes/$MOUNTVOLUME -type f -exec md5 {} ';'
I then use a visual prepare recipe with a few steps to extract the useful data.
Note that this method would only work with some kind of locally mounted file system.
This method is also considerably faster than many methods I've tried. I was able to do ~450,000 files in about 8-9 hours on a slowish connection.