Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Read excel file using Python Pandas

Solved!
pafj
Level 2
Read excel file using Python Pandas

Hi,

I went through the community and I couldn't find a solution for my issue when I try to import excel files from SFTP using python recipe: 

I'm using the following code:

import dataiku
import pandas as pd
import numpy as np
from dataiku import pandasutils as pdu


FOLDER_NAME = 'folder_1'
FILE_NAME = 'file_1.xlsx'
DATASET_NAME = 'dataset_1'

folder = dataiku.Folder(FOLDER_NAME)
with folder.get_download_stream(FILE_NAME) as f:
    df = pd.read_excel(f)

 

After running the above i get the following error:

UnsupportedOperation: seek

 

0 Kudos
1 Solution
VitaliyD
Dataiker
Dataiker

Hi @pafj,

You need to read the file first, please refer to the code below:

with folder.get_download_stream(FILE_NAME) as f:
    data = f.read()
    df = pd.read_excel(data)

It should work now.

Best

View solution in original post

3 Replies
VitaliyD
Dataiker
Dataiker

Hi @pafj,

You need to read the file first, please refer to the code below:

with folder.get_download_stream(FILE_NAME) as f:
    data = f.read()
    df = pd.read_excel(data)

It should work now.

Best

pafj
Level 2
Author

Hi @VitaliyD,

Thank you for the reply. I tried your code and I'm getting a new error:

XLRDError: Excel xlsx file; not supported

I tried to install xlrd==2.0.1 and openpyxl==3.0.9 packages, however even after installation, i'm still getting the error. 

0 Kudos
VitaliyD
Dataiker
Dataiker

Hi @pafj ,

If you won't specify an engine to use, the xlrd is used by default. xlrd removed support for anything other than .xls files from version 2.0 (docs), hence you will need to use xlrd<1.2.0 in your code env to be able to read the xlsx files with xlrd engine. Otherwise, you will need to specify openpyxl engine to use:

df = pd.read_excel(temp_path, engine='openpyxl')

 

Best.