Help needed regarding python packages installation using it

Solved!
sreejithkm
Level 2
Help needed regarding python packages installation using it

Hi,

I am in the Developer Course, being a not-so often coder (non coder) i am stuck with python libraries/package installation and usage. One of the course example is on reading pdf using python recipe, using tabula library, but i m not able to do it successfully as the output showed errors relating to no module tabula found. I installed it via pip from the /bin folder, but is not able see that in the code environment packages in the list of libraries listed from the Admin>codeenv. Please help on how to install the needed python libraries in general, and also in setting code env using conda and another python version say 3.7 though i could create codeenv in Administration>codeenv and set it via the project settings

Thanks & Best Regards

Sreejith

 

0 Kudos
1 Solution
AlexT
Dataiker

Hi,

Adding the requirements to a separate code environment is usually preferred over installing packages to your directly via pip or in your base code env. 

See: https://doc.dataiku.com/dss/latest/python/packages.html

In this case, you would want to 

1) Add the requirement  "tabula" to a code environment under packages to install and Save and Update. 

Screenshot 2021-09-27 at 07.58.46.png

2) Change your recipe or Notebook to use this code environment  

Also if you made a change to a code environment and are trying to use it i a notebook you will need to restart the Notebook kernel to detect the latest changes. 

View solution in original post

4 Replies
AlexT
Dataiker

Hi,

Adding the requirements to a separate code environment is usually preferred over installing packages to your directly via pip or in your base code env. 

See: https://doc.dataiku.com/dss/latest/python/packages.html

In this case, you would want to 

1) Add the requirement  "tabula" to a code environment under packages to install and Save and Update. 

Screenshot 2021-09-27 at 07.58.46.png

2) Change your recipe or Notebook to use this code environment  

Also if you made a change to a code environment and are trying to use it i a notebook you will need to restart the Notebook kernel to detect the latest changes. 

sreejithkm
Level 2
Author

Dear Alex,

Thank you so much 🙂 it worked 🙂

Best Regards

Sreejith

0 Kudos
JS
Level 1

Hello @sreejithkm ,

I believe you are using tabula to read the tables from PDFs. When I try to use tabula.read_pdf am getting import error. Just wanted to know if you had installed any other packages as tabula is java based library.

I would like to know if tabula.read_pdf can be used to read the tables from PDFs.

Thank you!


Regards,

J

0 Kudos
sreejithkm
Level 2
Author

Hi,

Please see this below.

This is a working code

import dataiku
import pandas as pd
from tabula.io import read_pdf
pdf_path = "https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf"

dfs = tabula.io.read_pdf(pdf_path, stream=True)
# read_pdf returns list of DataFrames
print(len(dfs))
dfs[0]

 

need tabula and tabula.py in the code environment, hope the code environment is selected

Best Regards

Sreejith