Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

read pdf with tabula on S3

Solved!
EdBerth
Level 2
read pdf with tabula on S3

Hi,

I am following this tutorial to work with pdf and managed folders :

https://knowledge.dataiku.com/latest/code/managed-folders/tutorial-managed-folders.html

But reading the pdf with tabula doesn't work, i have this error message

UnsupportedOperation: seek

 

My managed folder is in S3, how can I read this file ?

 

0 Kudos
1 Solution
CatalinaS
Dataiker

Can you try to add at the beginning of the code

import io
 
and read the pdf as follows:
 
tables = read_pdf(io.BytesIO(stream.read()), pages = "12-26", multiple_tables = True)

instead of 

tables = read_pdf(stream, pages = "12-26", multiple_tables = True)

View solution in original post

0 Kudos
1 Reply
CatalinaS
Dataiker

Can you try to add at the beginning of the code

import io
 
and read the pdf as follows:
 
tables = read_pdf(io.BytesIO(stream.read()), pages = "12-26", multiple_tables = True)

instead of 

tables = read_pdf(stream, pages = "12-26", multiple_tables = True)
0 Kudos

Labels

?
Labels (2)

Setup info

?
A banner prompting to get Dataiku