read excel file present in folder

degananda264
degananda264 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5

i could not able to read excel file present in dataiku folders .could you please share me the code if possible thanks in advance

Answers

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
    edited July 17

    Hi,

    From the Python API for managed folders we can get a handle to interact with the objects inside a folder. However, there is no specific Dataiku method for reading Excel files. One straightforward method to do so would be to use pandas. For example:

    import dataiku
    import pandas as pd
    folder=dataiku.Folder('<name of folder>')
    with folder.get_download_stream(<path of file inside folder>) as f:
      data=f.read()
      df=pd.read_excel(data,engine='openpyxl')
    print(df)

    Please note that depending on the details of your Python and DSS version and the extension of the files the code may need to be slightly different. Also, note I am using the 'openpyxl' engine to do the read. So the code env I am using has this package installed. You may choose to use a different one.

  • Sv3n-Sk4
    Sv3n-Sk4 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 32 ✭✭✭✭

    Hello @degananda264
    ,

    Did you try to use the "Create Dataset" when you selected your folder in your flow?

  • Chocoder
    Chocoder Registered Posts: 1
    edited July 17

    Hi @MiguelangelC
    ! I had the exact same question and your proposition worked perfectly fine for me!

    Thanks a lot,

    PS: maybe you should write your code like that as the ' ' were missing around path!

    import dataiku
    import pandas as pd
    folder=dataiku.Folder('<name of folder>')
    with folder.get_download_stream('<path of file inside folder>') as f:
      data=f.read()
      df=pd.read_excel(data,engine='openpyxl')
    print(df)

Setup Info
    Tags
      Help me…