We are migrating Dataiku from an onprem server to AWS. There is a project that currently uses a managed folder with a few csv files in it as an input into an R recipe. Post migration we wish to use an S3 location with HDFS connection into it for any storage. When I repoint the above managed folder to the S3' HDFS connection and try to run the recipe I get an error (see attached screenshot).
Please let me know if it is at all possible to use S3 in the above scenario. And if not, can you please point me in the right direction for the "read/write API" mentioned in the error.
Are you trying to read/write data from/to a managed folder manually by constructing a path with "get_path" or "file_path"?
If yes, you'd need to use "get_download_stream" and "upload_stream" for reading and writing operations:
Hi @Andrey , thanks for this. Our Data scientists have a huge R script that picks up files from a managed folder. The code looks something like this:
MyManagedFolder <- dkuManagedFolderPath("WCrIUW3D")
MyDataset = read.csv(paste0(MyManagedFolder , "/MyFile.csv"), stringsAsFactors = F, header = F)
Note there are lots more files in such a folder so ideally we will find an R based solution if at all possible.
My bad, I missed the fact that you're using R and proposed a Python solution.
In case of R the similar solution would be to use
In this case you rely on DSS to read data from the file storage behind managed folder and give you the result depending on what you pass as an "as" parameter.