API endpoint w/ access to dataframe in memory

adamwelly · April 2021

Hello,

I want to create a python function API endpoint that has access to a large pandas data frame in memory. I do not want to read the same data into a dataframe every time a request is made.

Is this possible? If so, how is best to do this?

It would be too slow to do as suggested here:

https://community.dataiku.com/t5/Using-Dataiku-DSS/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-Python-API/m-p/7543

Thank you,

Adam

fchataigner2 · April 2021

Hi,

you can build the Pandas dataframe outside the api_py_function(). This means the dataframe is only loaded once, when the endpoint is started.

adamwelly · April 2021

Thank you for your reply. Yes, it works!

I was having some trouble with how to actually build it outside the api_py_function. What worked best for me was to first write it to csv in a managed folder, then read the csv into a dataframe outside the function:

import pandas as pd
import os

folder_path = folders[0]
my_csv = os.path.join(folder_path, "my_file.csv")
df = pd.read_csv(my_csv)

def api_py_function(param):

return df.do_something(param)

Thanks!

API endpoint w/ access to dataframe in memory

Best Answer

Answers

Categories

Setup Info

Tags