Design the API node for running a pipeline

Bader
Bader Registered Posts: 46 ✭✭✭✭✭

Hello,

I have the following case and I would like to hear the best design for this :

I'm trying to create an API that takes one parameter and then set this parameter as project variable then runs scala code which will use the project variable and then generate a new dataset and return it

I have done the following :

1- API service created in design node

2- in this Api, it will set a project variable then it will build the dataset

3- return the result

It works fine in design node. However, I would like to create an API node but I'm not sure how I deploy the pipeline and the api.

Also, I'm not sure about the max number of calls to this api. Currently, when I run it using design node, it only allow one project build per call.

Could you please advise in how to design this? I have seen the prediction mode design but my case is different since it needs to build a dataset by inputting parameter

Best Answer

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Answer ✓

    In this setup, no need for API nodes. But to separate development and production environment, you may need an Automation node.

    So in short: you prototype your project in the Design node (with the scenario), push it to the Automation node, and finally trigger the project on the automation node using the Dataiku API Python client on your external system. To authenticate your client, you'll need an API token, which you can generate on the Automation node project.

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    Thanks for explaining your technical requirement. To best answer you, I will need to best understand your context. What is the scala code designed to do in your Design node? How do you want your end-users to interact with the results?

    API nodes are typically designed to run code in a self-contained and a stateless manner. This is to guarantee low latency, as API nodes are meant for real-time use cases.

    Best regards,

    Alex

  • Bader
    Bader Registered Posts: 46 ✭✭✭✭✭
    edited July 17

    Thanks Alex,

    Regarding Scala code, it uses the project variable that inserted from end-user, then it will perform querying into some dataset and tables then it will perform some competition.

    The result or reply to end-user is going be json object.

    My case is real-time or semi-time. because the building of Dataset takes 2 or 3 minutes which is acceptable

    Below is something close with python code :

    In my flow:

    dataiku.png

    in my api :

    def api_py_function(id):

    ## Setting project var

    definition
    = { "type" : "NON_RECURSIVE_FORCED_BUILD", "outputs" : [{ "id" : "testcsv3", }] } job = project.start_job_and_wait(definition)

    ## getting the result

    return the result

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    This type of "semi real-time" requirement is not typically answered by the API node. Instead, I would recommend to keep it in the DSS Design node.

    Allow me to develop

    You can develop a simple webapp for your user to change the variables and run the flow by triggering a scenario. The Dataiku API contains all the necessary methods to perform these actions. Once your scenario has succeeded, you can display the result JSON to the user in the webapp.

    You can find a built-in example of webapp using the Dataiku API on this screen:

    Screenshot 2020-05-04 at 15.46.52.png

    Hope it helps,

    Alex

  • Bader
    Bader Registered Posts: 46 ✭✭✭✭✭

    Thanks Alex, We have to create an API because the end-user will use other platform and developers need only to call api. Can I make the Wepapp callable by other platform ?

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    The Dataiku API can be used from outside, using the Python client (if Python is available in your "other platform"). You can follow the documentation on https://doc.dataiku.com/dss/latest/python-api/rest-api-client/index.html#from-outside-dss.

    Alternatively, if Python is not available, you can call directly our REST API: https://doc.dataiku.com/dss/api/7.0/rest/

    This Public API covers what you want to achieve: setting project variables, triggering a scenario, waiting for results, and fetching the result dataset.

  • Bader
    Bader Registered Posts: 46 ✭✭✭✭✭

    Thsnka Alex, How about deployment ? should I create separate API node ? if yes, should I package my api and bundle my project then push it to api node

Setup Info
    Tags
      Help me…