How to read file with Python from HDFS managed folder

Options
milko_ivanov
milko_ivanov Registered Posts: 1 ✭✭✭✭
Hello

Could you give example how to read csv file with Python\panda from HDFS managed folder?

Thanks

Milko

Answers

  • jereze
    jereze Alpha Tester, Dataiker Alumni Posts: 190 ✭✭✭✭✭✭✭✭
    Options

    Hi,

    For files within a managed folders, the API provides few interactions. You can use the get_download_stream() method.

    For easier usage, you should consider have datasets that have more methods available.

    Jeremy

  • Vinothkumar
    Vinothkumar Registered Posts: 17 ✭✭✭✭
    Options

    WIth the get_download_stream, i couldnt read excel. is there any different way to do that?

    with handle1.get_download_stream('Sites.xlsx') as f:

    data=f.readline()

    It reads as xml content.

  • Vinothkumar
    Vinothkumar Registered Posts: 17 ✭✭✭✭
    Options

    To add more details:

    tried to create a python script..
    1.First option:
    Tried to read the files which is available in the paths. Able to read the file which is in csv format. but unable to read which is in excel.Not so sure why.But looks DSS mainly supports txt n csv
    Code:
    with handle1.get_download_stream('/dqs/DQS_Reference Study Sites.csv') as f:

    data=f.readlines() ##able to read csv.But in the same place if i keep excel and try to read.It comes as kind of xml component.
    2.Second option:
    Instead of reading excel via python.If we able to create a empty excel with specific headers(as like original file) and place in s3.So that the regular flow will be able to run that
    But again here i am able to place the empty dataframe with just columns alone as a csv file.But the same way im not able to move excel file.
    Code:
    with handle1.get_writer(Filename) as writer:
    writer.write(network_df.to_csv().encode("utf-8"))#Working fine.but the same to_excel not working.
    So if any one option works fine then that will solve my problem. Can someone help me here?
Setup Info
    Tags
      Help me…