read pdf with tabula on S3
EdBerth
Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 15 ✭✭
Hi,
I am following this tutorial to work with pdf and managed folders :
https://knowledge.dataiku.com/latest/code/managed-folders/tutorial-managed-folders.html
But reading the pdf with tabula doesn't work, i have this error message
UnsupportedOperation: seek
My managed folder is in S3, how can I read this file ?
Tagged:
Best Answer
-
Can you try to add at the beginning of the code
import io
and read the pdf as follows:tables = read_pdf(io.BytesIO(stream.read()), pages = "12-26", multiple_tables = True)
instead of
tables = read_pdf(stream, pages = "12-26", multiple_tables = True)