Help needed regarding python packages installation using it
Hi,
I am in the Developer Course, being a not-so often coder (non coder) i am stuck with python libraries/package installation and usage. One of the course example is on reading pdf using python recipe, using tabula library, but i m not able to do it successfully as the output showed errors relating to no module tabula found. I installed it via pip from the /bin folder, but is not able see that in the code environment packages in the list of libraries listed from the Admin>codeenv. Please help on how to install the needed python libraries in general, and also in setting code env using conda and another python version say 3.7 though i could create codeenv in Administration>codeenv and set it via the project settings
Thanks & Best Regards
Sreejith
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Adding the requirements to a separate code environment is usually preferred over installing packages to your directly via pip or in your base code env.
See: https://doc.dataiku.com/dss/latest/python/packages.html
In this case, you would want to
1) Add the requirement "tabula" to a code environment under packages to install and Save and Update.
2) Change your recipe or Notebook to use this code environment
Also if you made a change to a code environment and are trying to use it i a notebook you will need to restart the Notebook kernel to detect the latest changes.
Answers
-
Sreejith Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer Posts: 12 ✭✭✭✭
Dear Alex,
Thank you so much
it worked Best Regards
Sreejith
-
JS Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 Partner
Hello @sreejithkm
,I believe you are using tabula to read the tables from PDFs. When I try to use tabula.read_pdf am getting import error. Just wanted to know if you had installed any other packages as tabula is java based library.
I would like to know if tabula.read_pdf can be used to read the tables from PDFs.
Thank you!
Regards,
J
-
Sreejith Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer Posts: 12 ✭✭✭✭
Hi,
Please see this below.
This is a working code
import dataiku
import pandas as pd
from tabula.io import read_pdf
pdf_path = "https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf"dfs = tabula.io.read_pdf(pdf_path, stream=True)
# read_pdf returns list of DataFrames
print(len(dfs))
dfs[0]need tabula and tabula.py in the code environment, hope the code environment is selected
Best Regards
Sreejith