Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Running a scenario with different parameters from API in parallel

Solved!
sujayramaiah
Level 2
Running a scenario with different parameters from API in parallel

We have deployed an API using a Python Function.

It calls a scenario with a custom parameter.

This parameter is used to run the scenario steps for one partition of the data.

In the scenario, it computes additional features and gets predictions from a deployed model.

Since the parameter drives different partitions, can we run multiple API calls in parallel?

In other words, Can the same scenario be run in parallel for different partitions?

Please advise.


Operating system used: Linux

0 Kudos
1 Solution
ZachM
Dataiker

Hi,

The following example endpoint will run a SQL query using SQLExecutor2.

import dataiku

# Establish the connection to DSS, and set a default project
dataiku.set_remote_dss("http://HOST", "API_SECRET")
dataiku.set_default_project_key("YOUR_PROJECT")


def api_py_function():
    executor = dataiku.SQLExecutor2(connection="YOUR_SQL_CONNECTION")
    df = executor.query_to_df('SELECT * FROM "YOUR_TABLE" LIMIT 10')
    # Return the result as an array
    return df.values.tolist()

Is this what you were looking for?

More information about performing SQL queries: https://doc.dataiku.com/dss/latest/python-api/sql.html

More information about connecting to DSS from an API endpoint: https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#setting-up-the-connection-with-dss

Thanks,

Zach

View solution in original post

0 Kudos
5 Replies
ZachM
Dataiker

Hi @sujayramaiah,

Unfortunately, it isn't possible to run the same scenario multiple times in parallel.

If you want to build multiple partitions at once, I recommend redesigning your scenario so that it can build multiple partitions in a single run.

Thanks,

Zach

0 Kudos
tgb417

@ZachM

I'm working on something similar right now. Can you say a bit more about what you have in mind?

--Tom
0 Kudos
sujayramaiah
Level 2
Author

Thanks for getting back ZachM

Since we are trying to get realtime predictions from our API deployment, grouping a bunch of partitions so they can be processed by one scenario is not an option at this time for us.

As an alternative, we are trying to run a custom python code in the Code Library from an API end point to avoid scenarios completely.

This piece of code will not update any datasets. Can this function be run in parallel?

Using dataikuapi, we are able to execute a function from the library. But when we need to access a SQL Connection object from the project to execute a SQL, How can execute it within the project?

We are able to get a handle to the project by specifying the host and api_key as shown below.

client = dataikuapi.DSSClient(host, api_key)
project = client.get_project(project_key)

How can we execute a custom python function? Please advise.

0 Kudos
ZachM
Dataiker

Hi,

The following example endpoint will run a SQL query using SQLExecutor2.

import dataiku

# Establish the connection to DSS, and set a default project
dataiku.set_remote_dss("http://HOST", "API_SECRET")
dataiku.set_default_project_key("YOUR_PROJECT")


def api_py_function():
    executor = dataiku.SQLExecutor2(connection="YOUR_SQL_CONNECTION")
    df = executor.query_to_df('SELECT * FROM "YOUR_TABLE" LIMIT 10')
    # Return the result as an array
    return df.values.tolist()

Is this what you were looking for?

More information about performing SQL queries: https://doc.dataiku.com/dss/latest/python-api/sql.html

More information about connecting to DSS from an API endpoint: https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#setting-up-the-connection-with-dss

Thanks,

Zach

0 Kudos
sujayramaiah
Level 2
Author

Thanks a lot @ZachM !!! That worked !


0 Kudos