reading xls and ppt from a managed folder on a S3 connector

Pascal_B
Pascal_B Registered Posts: 10 ✭✭✭✭
edited July 16 in Using Dataiku

Hello,

I am trying to use acess and write files on a S3 connector

working in notebook, I was able to have something that work for csv files. but I can not find a way for xls or ppt files.

Here is the idea for csv files :

myrawfile <- dkuManagedFolderDownloadPath("myfolder", '/myfile.csv', as = "raw")
mydata<- read_delim(myrawfile, delim = ";", ) 

When I try something similar for xls (read_xls from {tidyverse})

myrawfile <- dkuManagedFolderDownloadPath("myfolder", '/myfile.csv', as = "raw") 
mydata<- read_xls(myrawfile)

or with ppt (read_pptx from {officer})

my_raw_ppt<- dkuManagedFolderDownloadPath("my_folder", '/my_file.pptx', as = "raw")
my_ppt <- read_pptx(my_raw_ppt) 

I get an error :

Error in file.exists(path): invalid 'file' argument
Traceback:

1. read_pptx(ppt_bin)
2. file.exists(path)

as these functions are expecting path rather than files

Answers

  • AndrewM
    AndrewM Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 20 Dataiker

    Hi Pascal,

    I've included some example code below that should work for this. The R API does not have a way to read a file path from S3. Due to this, we have to copy the file locally to the notebook and then read it from the disk. The code below highlights this.

    #Set directory to current notebook path. File is the file you want to access
    #getwd() is used to set the path to the notebooks current directory
    directory<- paste(getwd(),"/myfolderlocal/", sep="")
    file<-paste(directory,"samplepptx.pptx",sep="")

    #Have to copy to a local folder first, we don't have an API to access folder content on S3 with paths
    copy<- dkuManagedFolderCopyToLocal("myfolder",directory)

    #Read the powerpoint as normal from disk
    mydata<-read_pptx(file)
    content <- pptx_summary(mydata)
    content

    Thank you.
    Andrew

Setup Info
    Tags
      Help me…