Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
We have deployed an API using a Python Function.
It calls a scenario with a custom parameter.
This parameter is used to run the scenario steps for one partition of the data.
In the scenario, it computes additional features and gets predictions from a deployed model.
Since the parameter drives different partitions, can we run multiple API calls in parallel?
In other words, Can the same scenario be run in parallel for different partitions?
Please advise.
Operating system used: Linux
Hi,
The following example endpoint will run a SQL query using SQLExecutor2.
import dataiku
# Establish the connection to DSS, and set a default project
dataiku.set_remote_dss("http://HOST", "API_SECRET")
dataiku.set_default_project_key("YOUR_PROJECT")
def api_py_function():
executor = dataiku.SQLExecutor2(connection="YOUR_SQL_CONNECTION")
df = executor.query_to_df('SELECT * FROM "YOUR_TABLE" LIMIT 10')
# Return the result as an array
return df.values.tolist()
Is this what you were looking for?
More information about performing SQL queries: https://doc.dataiku.com/dss/latest/python-api/sql.html
More information about connecting to DSS from an API endpoint: https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#setting-up-the-connection-with-dss
Thanks,
Zach
Hi @sujayramaiah,
Unfortunately, it isn't possible to run the same scenario multiple times in parallel.
If you want to build multiple partitions at once, I recommend redesigning your scenario so that it can build multiple partitions in a single run.
Thanks,
Zach
I'm working on something similar right now. Can you say a bit more about what you have in mind?
Thanks for getting back ZachM
Since we are trying to get realtime predictions from our API deployment, grouping a bunch of partitions so they can be processed by one scenario is not an option at this time for us.
As an alternative, we are trying to run a custom python code in the Code Library from an API end point to avoid scenarios completely.
This piece of code will not update any datasets. Can this function be run in parallel?
Using dataikuapi, we are able to execute a function from the library. But when we need to access a SQL Connection object from the project to execute a SQL, How can execute it within the project?
We are able to get a handle to the project by specifying the host and api_key as shown below.
client = dataikuapi.DSSClient(host, api_key)
project = client.get_project(project_key)
How can we execute a custom python function? Please advise.
Hi,
The following example endpoint will run a SQL query using SQLExecutor2.
import dataiku
# Establish the connection to DSS, and set a default project
dataiku.set_remote_dss("http://HOST", "API_SECRET")
dataiku.set_default_project_key("YOUR_PROJECT")
def api_py_function():
executor = dataiku.SQLExecutor2(connection="YOUR_SQL_CONNECTION")
df = executor.query_to_df('SELECT * FROM "YOUR_TABLE" LIMIT 10')
# Return the result as an array
return df.values.tolist()
Is this what you were looking for?
More information about performing SQL queries: https://doc.dataiku.com/dss/latest/python-api/sql.html
More information about connecting to DSS from an API endpoint: https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#setting-up-the-connection-with-dss
Thanks,
Zach
Thanks a lot @ZachM !!! That worked !