API Service: Python Prediction Endpoint vs Python Function
Hi, I have a general question regarding the difference between the python prediction and python function endpoints in the API Service in regards to serving a custom python model. From my understanding, the only advantage that the python prediction endpoint has over the python function endpoint is the ability to automatically include enrichments (which I admit is a pretty important feature). Both can load trained models from managed folders. The python function endpoint however has the following advantages:
- The code of the python function is less verbose than the python prediction function. The fact that the python prediction endpoint exposes a pandas dataframes as input directly is nice, but the kwargs in a python function can be trivially converted to a dataframe as well
- Input format can be any object that is JSON serializable (doesn't have to adhere to the "features"/ "items" format of the python prediction endpoint)
- Output format can be anything that is JSON serializable (not just the predefined pandas format for the python prediction endpoint)
- Data types are conserved, meaning that the values of features coming into the prediction function are not just all converted to strings like the python prediction endpoint. This was a pain point for us in python prediction endpoint because it meant you couldn't use the same preprocessing pipeline for both training and inference if you had boolean data for example, due to the way that LabelBinarizer / DictVectorizer work (unless you convert the training data to strings too)
Would I be correct to assume that if we didn't need enrichments, then a python function would probably be a simpler better choice, and probably a bit faster, because of the absence of the pre and post processing to convert from and to pandas dataframes? Am I missing any more advantages for the python prediction endpoint?
Answers
-
Have a look this article: https://doc.dataiku.com/dss/latest/apinode/endpoint-python-prediction.html
Our data scientists preferred approach is the use a Managed Folder as outlined here.
Hope this helps
Mark
-
Hey Mark, thanks for the response, given that both the python function endpoint and the python prediction endpoint support using a managed folder, I don't see that as a point of differentiation. Other than enrichments, are there any other advantages for using a python prediction endpoint over a python function endpoint? Are there any points in my post concerning the advantages of the python function endpoint that are invalid?