Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

reading xls and ppt from a managed folder on a S3 connector

Pascal_B
Level 2
reading xls and ppt from a managed folder on a S3 connector

Hello, 

I am trying to use acess and write files on a S3 connector

working in notebook, I was able to have something that work for csv files. but I can not find a way for xls or ppt files.

Here is the idea for csv files : 

myrawfile <- dkuManagedFolderDownloadPath("myfolder", '/myfile.csv', as = "raw")
mydata<- read_delim(myrawfile, delim = ";", ) 

When I try something similar for xls (read_xls from {tidyverse}) 

 

myrawfile <- dkuManagedFolderDownloadPath("myfolder", '/myfile.csv', as = "raw") 
mydata<- read_xls(myrawfile)

or with ppt (read_pptx from {officer})

my_raw_ppt<- dkuManagedFolderDownloadPath("my_folder", '/my_file.pptx', as = "raw")
my_ppt <- read_pptx(my_raw_ppt) 

I get an error :

Error in file.exists(path): invalid 'file' argument
Traceback:

1. read_pptx(ppt_bin)
2. file.exists(path)

as these functions are expecting path rather than files

0 Kudos
1 Reply
AndrewM
Dataiker
Dataiker

Hi Pascal,

I've included some example code below that should work for this. The R API does not have a way to read a file path from S3. Due to this, we have to copy the file locally to the notebook and then read it from the disk. The code below highlights this.

#Set directory to current notebook path. File is the file you want to access
#getwd() is used to set the path to the notebooks current directory
directory<- paste(getwd(),"/myfolderlocal/", sep="")
file<-paste(directory,"samplepptx.pptx",sep="")

#Have to copy to a local folder first, we don't have an API to access folder content on S3 with paths
copy<- dkuManagedFolderCopyToLocal("myfolder",directory)

#Read the powerpoint as normal from disk
mydata<-read_pptx(file)
content <- pptx_summary(mydata)
content

Thank you.
Andrew

0 Kudos
A banner prompting to get Dataiku DSS