Community Conundrum 25:Feature Visualization is now live! Read More

Design the API node for running a pipeline

Level 3
Design the API node for running a pipeline

Hello,  

I have the following case and I would like to hear the best design for this :  

I'm trying to create an API that takes one parameter and then set this parameter as project variable then runs scala code which will use the project variable and then generate a new dataset and return it

I have done the following : 

1- API service created in design node 

2- in this Api, it will set a project variable then it will build the dataset

3- return the result

 

It works fine in design node. However, I would like to create an API node but I'm not sure how I deploy the pipeline and the api. 

Also, I'm not sure about the max number of calls to this api. Currently, when I run it using design node, it only allow one project build per call. 

Could you please advise in how to design this?  I have seen the prediction mode design but my case is different  since it needs to build a dataset by inputting parameter  

 

7 Replies
Dataiker
Dataiker

Hi,

Thanks for explaining your technical requirement. To best answer you, I will need to best understand your context. What is the scala code designed to do in your Design node? How do you want your end-users to interact with the results?

API nodes are typically designed to run code in a self-contained and a stateless manner. This is to guarantee low latency, as API nodes are meant for real-time use cases. 

Best regards,

Alex

Level 3
Author

Thanks Alex,  

Regarding Scala code, it uses the project variable that inserted from end-user, then  it will perform querying  into some dataset and tables then it will perform some competition. 

The result or reply to end-user  is going  be json object.

My case is real-time or semi-time. because the building of Dataset takes 2 or 3 minutes which is acceptable  

 

Below is something close with python code  : 

In my flow: 

dataiku.png

in my api : 

def api_py_function(id):

## Setting project var

definition
= { "type" : "NON_RECURSIVE_FORCED_BUILD", "outputs" : [{ "id" : "testcsv3", }] } job = project.start_job_and_wait(definition)

## getting the result

return the result

 

Dataiker
Dataiker

Hi,

This type of "semi real-time" requirement is not typically answered by the API node. Instead, I would recommend to keep it in the DSS Design node.

Allow me to develop 🙂

You can develop a simple webapp for your user to change the variables and run the flow by triggering a scenario. The Dataiku API contains all the necessary methods to perform these actions. Once your scenario has succeeded, you can display the result JSON to the user in the webapp.

You can find a built-in example of webapp using the Dataiku API on this screen:

Screenshot 2020-05-04 at 15.46.52.png

Hope it helps,

Alex

Level 3
Author

Thanks Alex,  We have to create an API because the end-user will use other platform and developers need only to call api. Can I make the Wepapp callable by other platform ? 

Dataiker
Dataiker

The Dataiku API can be used from outside, using the Python client (if Python is available in your "other platform"). You can follow the documentation on https://doc.dataiku.com/dss/latest/python-api/rest-api-client/index.html#from-outside-dss.

Alternatively, if Python is not available, you can call directly our REST API: https://doc.dataiku.com/dss/api/7.0/rest/

This Public API covers what you want to achieve: setting project variables, triggering a scenario, waiting for results, and fetching the result dataset.

Level 3
Author

Thsnka Alex, How about deployment ? should I create separate API node ? if yes, should I  package my api and bundle my project then push it to api node 

Dataiker
Dataiker

In this setup, no need for API nodes. But to separate development and production environment, you may need an Automation node.

So in short: you prototype your project in the Design node (with the scenario), push it to the Automation node, and finally trigger the project on the automation node using the Dataiku API Python client on your external system. To authenticate your client, you'll need an API token, which you can generate on the Automation node project.

Labels (1)