Running a scenario with different parameters from API in parallel

sujayramaiah · November 2022

We have deployed an API using a Python Function.

It calls a scenario with a custom parameter.

This parameter is used to run the scenario steps for one partition of the data.

In the scenario, it computes additional features and gets predictions from a deployed model.

Since the parameter drives different partitions, can we run multiple API calls in parallel?

In other words, Can the same scenario be run in parallel for different partitions?

Please advise.

Operating system used: Linux

Zach · December 2022

Hi,

The following example endpoint will run a SQL query using SQLExecutor2.

import dataiku

# Establish the connection to DSS, and set a default project
dataiku.set_remote_dss("http://HOST", "API_SECRET")
dataiku.set_default_project_key("YOUR_PROJECT")


def api_py_function():
    executor = dataiku.SQLExecutor2(connection="YOUR_SQL_CONNECTION")
    df = executor.query_to_df('SELECT * FROM "YOUR_TABLE" LIMIT 10')
    # Return the result as an array
    return df.values.tolist()

Is this what you were looking for?

More information about performing SQL queries: https://doc.dataiku.com/dss/latest/python-api/sql.html

More information about connecting to DSS from an API endpoint: https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#setting-up-the-connection-with-dss

Thanks,

Zach

Zach · November 2022

Hi @sujayramaiah
,

Unfortunately, it isn't possible to run the same scenario multiple times in parallel.

If you want to build multiple partitions at once, I recommend redesigning your scenario so that it can build multiple partitions in a single run.

Thanks,

Zach

tgb417 · November 2022

@ZachM

I'm working on something similar right now. Can you say a bit more about what you have in mind?

sujayramaiah · November 2022

Thanks for getting back ZachM

Since we are trying to get realtime predictions from our API deployment, grouping a bunch of partitions so they can be processed by one scenario is not an option at this time for us.

As an alternative, we are trying to run a custom python code in the Code Library from an API end point to avoid scenarios completely.

This piece of code will not update any datasets. Can this function be run in parallel?

Using dataikuapi, we are able to execute a function from the library. But when we need to access a SQL Connection object from the project to execute a SQL, How can execute it within the project?

We are able to get a handle to the project by specifying the host and api_key as shown below.

client = dataikuapi.DSSClient(host, api_key)
project = client.get_project(project_key)

How can we execute a custom python function? Please advise.

sujayramaiah · December 2022

Thanks a lot @ZachM
!!! That worked !

Running a scenario with different parameters from API in parallel

Best Answer

Answers

Categories

Setup Info

Tags