Examples for custom prediction in API Designer

jpham3 · March 13

Are there any actual useful code examples of using custom prediction in python?

I have a model that exists in my Flow and I want to use that model to make a prediction just like the Prediction model api endpoint would do to start and then add more custom code on top of that.

The boiler plate code imports dataiku and reading the docs, it seems this should work:

def predict(self, features_df):




        project = dataiku.Project("My Project")

        model = dataiku.Model("My Model")

        predictor = model.get_predictor(features_df, with_input_cols=True, with_probas=True)


        predictions = predictor.predict(features_df)


        return predictions

But this gives me :

Exception: Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)

Googling that leads to thread where I should be using dataikuapi instead of dataiku which doesn't make sense.

Operating system used: Windows

yonghyun · March 20

def predict(self, features_df):        
  project = dataiku.Project("My Project")        
  model = dataiku.Model("My Model")        
  predictor = model.get_predictor(features_df, with_input_cols=True, with_probas=True)        
  predictions = predictor.predict(features_df)        
return predictions

위의 코드는 정상적으로 동작하지 않을 겁니다.

제가 시도한 방식이 원하시는 동작은 아닐 수도있지만 공유해 드립니다.

1. dataiku에서 만든 모델로 보여지는 "My Model" 이라는 모델을 Prediction model 로 api를 생성합니다.
2. def predict 에서 배포한 end point에 쿼리를 요청합니다. 이는 단건 추론만 되기때문에 for문으로 반복을 해야할 수 도있습니다.
그러면 원하시는 predictions 를 얻을 수 있을 것입니다.

위에 코드를 구현 예시입니다.

from dataiku.apinode import utils



def predict(self, features_df): 
  for feature in features_df : # df를 한줄씩 처리
        client = utils.get_self_client()

        result = client.predict_record("My Model_endpoint", {"feature1" : "value1", "feature2" : 42})

        predictions.append(result)
  predictions
return predictions

위에 코드가 정확하진 않을 수 는 있지만 저런 컨샙으로 하시면 원하시는 결과를 받아 보실 수 있으실 겁니다.

from dataiku.apinode import utils 를 사용해서 했지만 직접 end-point url 에 쿼리요청하여 결과를 받아 보실 수도있으 실 겁니다.

import requests
url = "https://앤드인트url"
data = {"name": "John", "age": 30}
response = requests.post(url, data=data)

Turribeach · March 13

The following thread should help:

Beginner Help: Deploying an API Service with Pickle Model from Jupyter Notebook in Dataiku

Velichka

Jan 22, 2025

Hello Dear Community,
I am a complete beginner in Dataiku and have created a Jupyter Notebook as a mini test model. I used Pickle to save the model and vectorizer into a managed folder named "Models". My goal is to make this model available as an API service, but I’m struggling with the process and would greatly appreciate step-by-step instructions to achieve this.
Below is the code I have written in my notebook:
API Designer
Settings: Function name → predict_cluster, code env →inherit project def (DSS builtin env)
Code:
import pickle

import dataiku

import dataikuapi



client = dataiku.api_client()

project_key = "EMPF_SYS" 

project = client.get_project(project_key)

folder_name = "Models"


managed_folder = dataiku.Folder(folder_name)



with managed_folder.get_download_stream("vectorizer.pkl") as stream:

   vectorizer, kmeans  = pickle.load(stream)


# Modell und Vektorisierer aus dem Managed Folder laden

#folder = Folder("Models")  

#with folder.get_download_stream("vectorizer.pkl") as f:

 #   vectorizer, kmeans = pickle.load(f)


# Endpunkt-Handler

def predict_cluster(request):


    try:

        data = request.json

        input_text = data.get("text", "")


        if not input_text:

            return {"error": "No text provided"}, 400


        vectorized_text = vectorizer.transform([input_text])


        cluster = kmeans.predict(vectorized_text)[0]


        # Ergebnis zurückgeben

        return {"cluster": int(cluster)}, 200

    except Exception as e:

        return {"error": str(e)}, 500

Test Query
{

   "text": "Test"

}
Security: Authorization method: Public
When I test the API query, I encounter the following error:
Dev server deployment
FAILED
Failed to initiate function server : <class 'Exception'> : Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)
I would be incredibly grateful for any hints, guidance, or resources that could help me resolve this issue, especially regarding deploying the model stored in the managed folder and exposing it as an API service.
Thank you in advance for your support! 🙏
Best regards,
Veli
Operating system used: Windows 11

jpham3 · March 13

Thanks for the reply, but doesn't seem to help. There isn't any useful code examples anywhere. The person in that thread also wasn't able to figure it out.

I'm not even trying to use a custom model. I'm attempting to use an existing model trained in my flow and create a custom api that has the exact same behavior as the "Prediction model" api.

Is this even possible with dataiku?

This thread is the thread I was referencing in my original post and getting the same error.

Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)

Eduardo

Jul 16, 2024

I'm creating a python function endpoint with this script:
https://us.v-cdn.net/6038231/uploads/C6V51Q7WJIXP/image.png
And I don't know how to deal with this error:
Dev server deployment FAILED
Failed to initiate function server : <class 'Exception'> : Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)

Turribeach · March 13

What your code is doing is never going to work in the API as it is conceptually wrong. If you read my post it explains why you can't use that code. There are different ways of exposing a model into an API and that will depend on the type of model. The link I posted includes the following link:

Exposing a Python prediction model — Dataiku DSS 13 documentation

Which explains how to make a Visual ML model available in an API service. Follow those steps. Alternatively you can code your own custom Python model and ML flow model and expose these too, see the following links:

Types of Endpoints — Dataiku DSS 13 documentation

jpham3 · March 13

Thanks for bearing with me. Maybe I'm blind but I cant seem to find where it explains "how to make a Visual ML model available in an API service". It seems like the only way to use Visual ML model is through a Prediction endpoint which does not allow custom code.

On the concepts page:

The API node supports 7 kinds of endpoints:

The Prediction or Clustering endpoint to predict or cluster using models created with the DSS Visual Machine Learning component.
The Python prediction endpoint to perform predictions using a custom model developed in Python
The MLflow Prediction endpoint to predict using imported MLflow models
The R prediction endpoint to perform predictions using a custom model developed in R
The Python function endpoint to call specific functions developed in Python
The R function endpoint to call specific functions developed in R
The SQL query endpoint to perform parametrized SQL queries
The Dataset lookup endpoint to perform data lookups in one or more DSS datasets

Doesn't look like there is an endpoint for using DSS Visual ML in Python.

It sounds like I will have to recreate the Visual ML model in my own python environment and upload the model to a managed folder and then maybe I'll be able to get that model through the custom python prediction endpoint.

Turribeach · March 13

jpham3

Mar 13, 2025

Thanks for bearing with me. Maybe I'm blind but I cant seem to find where it explains "how to make a Visual ML model available in an API service". It seems like the only way to use Visual ML model is through a Prediction endpoint which does not allow custom code.
On the concepts page:
The API node supports 7 kinds of endpoints:
The Prediction or Clustering endpoint to predict or cluster using models created with the DSS Visual Machine Learning component.
The Python prediction endpoint to perform predictions using a custom model developed in Python
The MLflow Prediction endpoint to predict using imported MLflow models
The R prediction endpoint to perform predictions using a custom model developed in R
The Python function endpoint to call specific functions developed in Python
The R function endpoint to call specific functions developed in R
The SQL query endpoint to perform parametrized SQL queries
The Dataset lookup endpoint to perform data lookups in one or more DSS datasets
Doesn't look like there is an endpoint for using DSS Visual ML in Python.
It sounds like I will have to recreate the Visual ML model in my own python environment and upload the model to a managed folder and then maybe I'll be able to get that model through the custom python prediction endpoint.

In principe you are correct. In practice there are ways around it however these will have some undesired side effects. For instance why would you want to have your API node depend on your Designer node for scoring? That's not usually a good pattern. So why do you feel the need to add custom code? If you need to do do query enrichment you can use the Enrichments section of the API endpoint.

jpham3 · March 13

I see. Maybe my use case is wrong for the application. Basically, before using dataiku, I have manually trained a RandomForest classifier on a previous months of data. I have also spun a REST api that loads that model and inferences a field. Instead of a singular prediction, we send back the highest 3 probable predictions. This is the custom code I want to add on.

But my impression was that dataiku could automate the fetching, training, and api deployment all in one flow. Which would be a huge value add since I could automate the training of a different month's data or create multiple models for different groups.

So it sounds like using a Designer node for the api is not a good pattern. Sorry, I'm still new newbie. But the model created from the training recipe looks pretty good.

Turribeach · March 14

OK I see, seems like a valid use case. It may be possible to customise the code that Dataiku creates for the API service but this is beyond the knowledge I have so may be others can chip in.

What certainly is possible is for you to write Python code and hit the model directly like you were attempting to do in your initial code snippet. While that code can't really talk to an API node as that's a "dumb" headless/projectless node you could point it to an Automation node instead of a Designer node. And since all you need to get your predictions is pure Python and the Dataiku Python API you wouldn't need an API node / API service. So while in the traditional Dataiku architecture blueprint the Automation node is used for "batch scoring" nothing says you can't use it for "realtime scoring" too. There are however advantanges to using the API node like being able to be scalable, highly available and can fully run in Kubernetes. These options are not available in the Designer/Automation node.

jpham3 · March 20

Thank you! Sounds like a good suggestion, I will give it a try. Translating to English for others.

def predict(self, features_df):        

  project = dataiku.Project("My Project")        

  model = dataiku.Model("My Model")        

  predictor = model.get_predictor(features_df, with_input_cols=True, with_probas=True)        

  predictions = predictor.predict(features_df)        

return predictions

The above code will not work properly.

The method I tried may not be what you want, but I will share it.

Create an API with a model called "My Model" that appears to be a model created in dataiku as a prediction model.
Request a query to the end point deployed in def predict. Since this is only a single inference, you may need to repeat it with a for loop.
Then, you will be able to get the predictions you want.

The above code is an example of implementation.

from dataiku.apinode import utils




def predict(self, features_df): 

  for feature in features_df : # df를 한줄씩 처리

        client = utils.get_self_client()


        result = client.predict_record("My Model_endpoint", {"feature1" : "value1", "feature2" : 42})


        predictions.append(result)

  predictions

return predictions

The code above may not be accurate, but if you use this concept, you will be able to receive the results you want.

from dataiku.apinode import utils 를 사용해서 했지만 직접 end-point url 에 쿼리요청하여 결과를 받아 보실 수도있으 실 겁니다.

import requestsurl = "https://앤드인트url"data = {"name": "John", "age": 30}response = requests.post(url, data=data)

Examples for custom prediction in API Designer

Setup Info

Welcome!

Best Answer

Answers

Beginner Help: Deploying an API Service with Pickle Model from Jupyter Notebook in Dataiku

Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

Examples for custom prediction in API Designer

Setup Info

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories