API endpoint w/ access to dataframe in memory

Solved!
adamwelly
Level 1
API endpoint w/ access to dataframe in memory

Hello,

I want to create a python function API endpoint that has access to a large pandas data frame in memory. I do not want to read the same data into a dataframe every time a request is made.

Is this possible? If so, how is best to do this?

It would be too slow to do as suggested here:

https://community.dataiku.com/t5/Using-Dataiku-DSS/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-...

Thank you,

Adam

0 Kudos
1 Solution
fchataigner2
Dataiker

Hi,

you can build the Pandas dataframe outside the api_py_function(). This means the dataframe is only loaded once, when the endpoint is started.

View solution in original post

2 Replies
fchataigner2
Dataiker

Hi,

you can build the Pandas dataframe outside the api_py_function(). This means the dataframe is only loaded once, when the endpoint is started.

adamwelly
Level 1
Author

Thank you for your reply. Yes, it works!

I was having some trouble with how to actually build it outside the api_py_function. What worked best for me was to first write it to csv in a managed folder, then read the csv into a dataframe outside the function:

import pandas as pd
import os

folder_path = folders[0]
my_csv = os.path.join(folder_path, "my_file.csv")
df = pd.read_csv(my_csv)

def api_py_function(param):

    return df.do_something(param)

Thanks!