Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 13, 2025 3:02PM
Likes: 0
Replies: 9
Are there any actual useful code examples of using custom prediction in python?
I have a model that exists in my Flow and I want to use that model to make a prediction just like the Prediction model api endpoint would do to start and then add more custom code on top of that.
The boiler plate code imports dataiku and reading the docs, it seems this should work:
def predict(self, features_df):
project = dataiku.Project("My Project")
model = dataiku.Model("My Model")
predictor = model.get_predictor(features_df, with_input_cols=True, with_probas=True)
predictions = predictor.predict(features_df)
return predictions
But this gives me :
Exception: Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)
Googling that leads to thread where I should be using dataikuapi instead of dataiku which doesn't make sense.
Operating system used: Windows
def predict(self, features_df): project = dataiku.Project("My Project") model = dataiku.Model("My Model") predictor = model.get_predictor(features_df, with_input_cols=True, with_probas=True) predictions = predictor.predict(features_df) return predictions
위의 코드는 정상적으로 동작하지 않을 겁니다.
제가 시도한 방식이 원하시는 동작은 아닐 수도있지만 공유해 드립니다.
1. dataiku에서 만든 모델로 보여지는 "My Model" 이라는 모델을 Prediction model 로 api를 생성합니다.
2. def predict 에서 배포한 end point에 쿼리를 요청합니다. 이는 단건 추론만 되기때문에 for문으로 반복을 해야할 수 도있습니다.
그러면 원하시는 predictions 를 얻을 수 있을 것입니다.
위에 코드를 구현 예시입니다.
from dataiku.apinode import utils
def predict(self, features_df): for feature in features_df : # df를 한줄씩 처리 client = utils.get_self_client()
result = client.predict_record("My Model_endpoint", {"feature1" : "value1", "feature2" : 42})
predictions.append(result) predictions return predictions
위에 코드가 정확하진 않을 수 는 있지만 저런 컨샙으로 하시면 원하시는 결과를 받아 보실 수 있으실 겁니다.
from dataiku.apinode import utils 를 사용해서 했지만 직접 end-point url 에 쿼리요청하여 결과를 받아 보실 수도있으 실 겁니다.
import requests url = "https://앤드인트url" data = {"name": "John", "age": 30} response = requests.post(url, data=data)
The following thread should help:
Thanks for the reply, but doesn't seem to help. There isn't any useful code examples anywhere. The person in that thread also wasn't able to figure it out.
I'm not even trying to use a custom model. I'm attempting to use an existing model trained in my flow and create a custom api that has the exact same behavior as the "Prediction model" api.
Is this even possible with dataiku?
This thread is the thread I was referencing in my original post and getting the same error.
What your code is doing is never going to work in the API as it is conceptually wrong. If you read my post it explains why you can't use that code. There are different ways of exposing a model into an API and that will depend on the type of model. The link I posted includes the following link:
Which explains how to make a Visual ML model available in an API service. Follow those steps. Alternatively you can code your own custom Python model and ML flow model and expose these too, see the following links:
Thanks for bearing with me. Maybe I'm blind but I cant seem to find where it explains "how to make a Visual ML model available in an API service". It seems like the only way to use Visual ML model is through a Prediction endpoint which does not allow custom code.
On the concepts page:
The API node supports 7 kinds of endpoints:
Doesn't look like there is an endpoint for using DSS Visual ML in Python.
It sounds like I will have to recreate the Visual ML model in my own python environment and upload the model to a managed folder and then maybe I'll be able to get that model through the custom python prediction endpoint.
In principe you are correct. In practice there are ways around it however these will have some undesired side effects. For instance why would you want to have your API node depend on your Designer node for scoring? That's not usually a good pattern. So why do you feel the need to add custom code? If you need to do do query enrichment you can use the Enrichments section of the API endpoint.
I see. Maybe my use case is wrong for the application. Basically, before using dataiku, I have manually trained a RandomForest classifier on a previous months of data. I have also spun a REST api that loads that model and inferences a field. Instead of a singular prediction, we send back the highest 3 probable predictions. This is the custom code I want to add on.
But my impression was that dataiku could automate the fetching, training, and api deployment all in one flow. Which would be a huge value add since I could automate the training of a different month's data or create multiple models for different groups.
So it sounds like using a Designer node for the api is not a good pattern. Sorry, I'm still new newbie. But the model created from the training recipe looks pretty good.
OK I see, seems like a valid use case. It may be possible to customise the code that Dataiku creates for the API service but this is beyond the knowledge I have so may be others can chip in.
What certainly is possible is for you to write Python code and hit the model directly like you were attempting to do in your initial code snippet. While that code can't really talk to an API node as that's a "dumb" headless/projectless node you could point it to an Automation node instead of a Designer node. And since all you need to get your predictions is pure Python and the Dataiku Python API you wouldn't need an API node / API service. So while in the traditional Dataiku architecture blueprint the Automation node is used for "batch scoring" nothing says you can't use it for "realtime scoring" too. There are however advantanges to using the API node like being able to be scalable, highly available and can fully run in Kubernetes. These options are not available in the Designer/Automation node.
Thank you! Sounds like a good suggestion, I will give it a try. Translating to English for others.
def predict(self, features_df):
project = dataiku.Project("My Project")
model = dataiku.Model("My Model")
predictor = model.get_predictor(features_df, with_input_cols=True, with_probas=True)
predictions = predictor.predict(features_df)
return predictions
The above code will not work properly.
The method I tried may not be what you want, but I will share it.
The above code is an example of implementation.
from dataiku.apinode import utils
def predict(self, features_df):
for feature in features_df : # df를 한줄씩 처리
client = utils.get_self_client()
result = client.predict_record("My Model_endpoint", {"feature1" : "value1", "feature2" : 42})
predictions.append(result)
predictions
return predictions
The code above may not be accurate, but if you use this concept, you will be able to receive the results you want.
from dataiku.apinode import utils 를 사용해서 했지만 직접 end-point url 에 쿼리요청하여 결과를 받아 보실 수도있으 실 겁니다.
import requestsurl = "https://앤드인트url"data = {"name": "John", "age": 30}response = requests.post(url, data=data)