Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I am in the Developer Course, being a not-so often coder (non coder) i am stuck with python libraries/package installation and usage. One of the course example is on reading pdf using python recipe, using tabula library, but i m not able to do it successfully as the output showed errors relating to no module tabula found. I installed it via pip from the /bin folder, but is not able see that in the code environment packages in the list of libraries listed from the Admin>codeenv. Please help on how to install the needed python libraries in general, and also in setting code env using conda and another python version say 3.7 though i could create codeenv in Administration>codeenv and set it via the project settings
Thanks & Best Regards
Sreejith
Hi,
Adding the requirements to a separate code environment is usually preferred over installing packages to your directly via pip or in your base code env.
See: https://doc.dataiku.com/dss/latest/python/packages.html
In this case, you would want to
1) Add the requirement "tabula" to a code environment under packages to install and Save and Update.
2) Change your recipe or Notebook to use this code environment
Also if you made a change to a code environment and are trying to use it i a notebook you will need to restart the Notebook kernel to detect the latest changes.
Hi,
Adding the requirements to a separate code environment is usually preferred over installing packages to your directly via pip or in your base code env.
See: https://doc.dataiku.com/dss/latest/python/packages.html
In this case, you would want to
1) Add the requirement "tabula" to a code environment under packages to install and Save and Update.
2) Change your recipe or Notebook to use this code environment
Also if you made a change to a code environment and are trying to use it i a notebook you will need to restart the Notebook kernel to detect the latest changes.
Dear Alex,
Thank you so much 🙂 it worked 🙂
Best Regards
Sreejith
Hello @sreejithkm ,
I believe you are using tabula to read the tables from PDFs. When I try to use tabula.read_pdf am getting import error. Just wanted to know if you had installed any other packages as tabula is java based library.
I would like to know if tabula.read_pdf can be used to read the tables from PDFs.
Thank you!
Regards,
J
Hi,
Please see this below.
This is a working code
import dataiku
import pandas as pd
from tabula.io import read_pdf
pdf_path = "https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf"
dfs = tabula.io.read_pdf(pdf_path, stream=True)
# read_pdf returns list of DataFrames
print(len(dfs))
dfs[0]
need tabula and tabula.py in the code environment, hope the code environment is selected
Best Regards
Sreejith