API endpoint w/ access to dataframe in memory

Options
adamwelly
adamwelly Registered Posts: 2 ✭✭✭

Hello,

I want to create a python function API endpoint that has access to a large pandas data frame in memory. I do not want to read the same data into a dataframe every time a request is made.

Is this possible? If so, how is best to do this?

It would be too slow to do as suggested here:

https://community.dataiku.com/t5/Using-Dataiku-DSS/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-Python-API/m-p/7543

Thank you,

Adam

Best Answer

Answers

  • adamwelly
    adamwelly Registered Posts: 2 ✭✭✭
    Options

    Thank you for your reply. Yes, it works!

    I was having some trouble with how to actually build it outside the api_py_function. What worked best for me was to first write it to csv in a managed folder, then read the csv into a dataframe outside the function:

    import pandas as pd
    import os

    folder_path = folders[0]
    my_csv = os.path.join(folder_path, "my_file.csv")
    df = pd.read_csv(my_csv)

    def api_py_function(param):

    return df.do_something(param)

    Thanks!

Setup Info
    Tags
      Help me…