Design the API node for running a pipeline
Hello,
I have the following case and I would like to hear the best design for this :
I'm trying to create an API that takes one parameter and then set this parameter as project variable then runs scala code which will use the project variable and then generate a new dataset and return it
I have done the following :
1- API service created in design node
2- in this Api, it will set a project variable then it will build the dataset
3- return the result
It works fine in design node. However, I would like to create an API node but I'm not sure how I deploy the pipeline and the api.
Also, I'm not sure about the max number of calls to this api. Currently, when I run it using design node, it only allow one project build per call.
Could you please advise in how to design this? I have seen the prediction mode design but my case is different since it needs to build a dataset by inputting parameter
Best Answer
-
In this setup, no need for API nodes. But to separate development and production environment, you may need an Automation node.
So in short: you prototype your project in the Design node (with the scenario), push it to the Automation node, and finally trigger the project on the automation node using the Dataiku API Python client on your external system. To authenticate your client, you'll need an API token, which you can generate on the Automation node project.
Answers
-
Hi,
Thanks for explaining your technical requirement. To best answer you, I will need to best understand your context. What is the scala code designed to do in your Design node? How do you want your end-users to interact with the results?
API nodes are typically designed to run code in a self-contained and a stateless manner. This is to guarantee low latency, as API nodes are meant for real-time use cases.
Best regards,
Alex
-
Thanks Alex,
Regarding Scala code, it uses the project variable that inserted from end-user, then it will perform querying into some dataset and tables then it will perform some competition.
The result or reply to end-user is going be json object.
My case is real-time or semi-time. because the building of Dataset takes 2 or 3 minutes which is acceptable
Below is something close with python code :
In my flow:
in my api :
def api_py_function(id):
## Setting project var
definition = { "type" : "NON_RECURSIVE_FORCED_BUILD", "outputs" : [{ "id" : "testcsv3", }] } job = project.start_job_and_wait(definition)
## getting the result
return the result -
Hi,
This type of "semi real-time" requirement is not typically answered by the API node. Instead, I would recommend to keep it in the DSS Design node.
Allow me to develop
You can develop a simple webapp for your user to change the variables and run the flow by triggering a scenario. The Dataiku API contains all the necessary methods to perform these actions. Once your scenario has succeeded, you can display the result JSON to the user in the webapp.
You can find a built-in example of webapp using the Dataiku API on this screen:
Hope it helps,
Alex
-
Thanks Alex, We have to create an API because the end-user will use other platform and developers need only to call api. Can I make the Wepapp callable by other platform ?
-
The Dataiku API can be used from outside, using the Python client (if Python is available in your "other platform"). You can follow the documentation on https://doc.dataiku.com/dss/latest/python-api/rest-api-client/index.html#from-outside-dss.
Alternatively, if Python is not available, you can call directly our REST API: https://doc.dataiku.com/dss/api/7.0/rest/
This Public API covers what you want to achieve: setting project variables, triggering a scenario, waiting for results, and fetching the result dataset.
-
Thsnka Alex, How about deployment ? should I create separate API node ? if yes, should I package my api and bundle my project then push it to api node