Read excel file using Python Pandas

Pasha
Pasha Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭
edited July 16 in Using Dataiku

Hi,

I went through the community and I couldn't find a solution for my issue when I try to import excel files from SFTP using python recipe:

I'm using the following code:

import dataiku
import pandas as pd
import numpy as np
from dataiku import pandasutils as pdu


FOLDER_NAME = 'folder_1'
FILE_NAME = 'file_1.xlsx'
DATASET_NAME = 'dataset_1'

folder = dataiku.Folder(FOLDER_NAME)
with folder.get_download_stream(FILE_NAME) as f:
    df = pd.read_excel(f)

After running the above i get the following error:

UnsupportedOperation: seek

Tagged:

Best Answer

  • Vitaliy
    Vitaliy Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    edited July 17 Answer ✓

    Hi @pafj
    ,

    You need to read the file first, please refer to the code below:

    with folder.get_download_stream(FILE_NAME) as f:
        data = f.read()
        df = pd.read_excel(data)

    It should work now.

    Best

Answers

  • Pasha
    Pasha Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭
    edited July 17

    Hi @VitaliyD
    ,

    Thank you for the reply. I tried your code and I'm getting a new error:

    XLRDError: Excel xlsx file; not supported

    I tried to install xlrd==2.0.1 and openpyxl==3.0.9 packages, however even after installation, i'm still getting the error.

  • Vitaliy
    Vitaliy Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    edited July 17

    Hi @pafj
    ,

    If you won't specify an engine to use, the xlrd is used by default. xlrd removed support for anything other than .xls files from version 2.0 (docs), hence you will need to use xlrd<1.2.0 in your code env to be able to read the xlsx files with xlrd engine. Otherwise, you will need to specify openpyxl engine to use:

    df = pd.read_excel(temp_path, engine='openpyxl')

    Best.

  • Nainish09
    Nainish09 Registered Posts: 7

    Hi @VitaliyD

    Thanks for your answer, it helped me to read the files from my s3 bucket folder too.

    But now i am facing one issue that, i have multiple process excel files, I have to write them back to same folder in s3 bucket, how can i do it? without specifying any access key or folder path if possible.

  • Vitaliy
    Vitaliy Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker

    Hi,

    You would need to create a managed folder in the s3 bucket and use Dataiku managed folders API upload methods (based on your use case) to upload them to the managed folder.

    Best,

    Vitaliy

Setup Info
    Tags
      Help me…