Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

reading xls and ppt from a managed folder on a S3 connector

Level 2
reading xls and ppt from a managed folder on a S3 connector


I am trying to use acess and write files on a S3 connector

working in notebook, I was able to have something that work for csv files. but I can not find a way for xls or ppt files.

Here is the idea for csv files : 

myrawfile <- dkuManagedFolderDownloadPath("myfolder", '/myfile.csv', as = "raw")
mydata<- read_delim(myrawfile, delim = ";", ) 

When I try something similar for xls (read_xls from {tidyverse}) 


myrawfile <- dkuManagedFolderDownloadPath("myfolder", '/myfile.csv', as = "raw") 
mydata<- read_xls(myrawfile)

or with ppt (read_pptx from {officer})

my_raw_ppt<- dkuManagedFolderDownloadPath("my_folder", '/my_file.pptx', as = "raw")
my_ppt <- read_pptx(my_raw_ppt) 

I get an error :

Error in file.exists(path): invalid 'file' argument

1. read_pptx(ppt_bin)
2. file.exists(path)

as these functions are expecting path rather than files

0 Kudos
1 Reply

Hi Pascal,

I've included some example code below that should work for this. The R API does not have a way to read a file path from S3. Due to this, we have to copy the file locally to the notebook and then read it from the disk. The code below highlights this.

#Set directory to current notebook path. File is the file you want to access
#getwd() is used to set the path to the notebooks current directory
directory<- paste(getwd(),"/myfolderlocal/", sep="")

#Have to copy to a local folder first, we don't have an API to access folder content on S3 with paths
copy<- dkuManagedFolderCopyToLocal("myfolder",directory)

#Read the powerpoint as normal from disk
content <- pptx_summary(mydata)

Thank you.

0 Kudos