API Node: Creating a Python endpoint that executes arbitrary SQL passed in the request
Hi everyone,
I’m working with Dataiku API Node (Python endpoint) and I’m exploring an API pattern where the endpoint can execute a SQL query that is passed dynamically in the API call.
Use case
The goal is to:
- Create a Python API endpoint
- Accept a SQL query as part of the request payload, for example:
{
"query": "SELECT COUNT(*) FROM TABLE"
}
- Execute that query on a predefined database connection (e.g.
SNOWFLAKE) - Return the results as JSON
This endpoint would only be used internally, but I’m fully aware that executing arbitrary SQL brings security concerns, so I’m trying to understand what is possible with Dataiku.
Is there a way to interact with a snowflake database using a sql statement in a Python endpoint ?
Here is what I tried :
import dataiku
from dataiku.core.sql import SQLExecutor2
dataiku.set_default_project_key("PROJECT_KEY")
# function to execute the query using SQLExecutor2
def executeQuery(database, query):
query_result = SQLExecutor2(connection=database).query_to_df(query)
return query_result
def api_py_function():
return executeQuery("SNOWFLAKE_CXN_NAME",
"SELECT COUNT(*) FROM SCEHMAS.TABLE")
result : no adapters were found for intercom/sql-queries/start-streaming
Thanks in advance.
Dataiku version used: 13.5.5
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,696 NeuronWhy are you using a Python endpoint if there is already a built-in SQL Query endpoint type?
-
Hi,
I am unsure what you are doing, and your code seems to be just boilerplate. So, it is not easy to help you with this subject.
Did you look at the documentation?
Does a code like this do the job:
dataset = dataiku.Dataset("pro_customers_sql") table_name = dataset.get_location_info().get('info', {}).get('quotedResolvedTableName') executor = SQLExecutor2(dataset=dataset) cid = Constant(str(customerId)) escaped_cid = toSQL(cid, dialect=Dialects.POSTGRES) # Replace by your DB query_reader = executor.query_to_iter( f"""SELECT "name", "job", "company" FROM {table_name} WHERE "id" = {escaped_cid}""") -
The SQL Query endpoint is a static way of querying the database, I want a solution that would allow me to do any query through the API call on the connexion.
-
Hello, thanks for taking a look, while reading the documentation I did another version of my python endpoint :
from dataiku import SQLExecutor2
from dataiku.apinode import utils
# Endpoint API DSS
def api_py_function():
# Connect to DSS and get a client
client = utils.get_self_client()
executor = SQLExecutor2(connection="SNOWFLAKE")
query_reader = executor.query_to_iter("SELECT 1 FROM DUAL")
query_iterator = query_reader.iter_tuples()
return query_iteratorI get the error :
Failed: Failed to run function : <class 'Exception'> : Default project key is not specified (no DKU_CURRENT_PROJECT_KEY in env)
-
Hello,
And what about using: ?
-
After doing further testing, I found that the SQL drivers are not available on the aks pod where is hosted the python api.
Dataiku API Node does not support dynamic SQL execution via DSS-managed connections. This is a deliberate architectural and security constraint. The only supported way to execute SQL on API Node is through SQL Query endpoints with predefined queries.
Sadly, a SQL Endpoint like 'SELECT ?' wont be usable by the Python Endpoint as well since the parameters are passed in a Dict and surrounded by single quotes which will result in for example :
{'QUERY' : '1 FROM DUAL'} → SELECT '1 FROM DUAL' in Snowflake will give us the result of the API '1 FROM DUAL' instead of '1'.This behavior is intentional and consistent across databases. It prevents SQL injection, enforces query validation, and preserves execution plans. Consequently, SQL parameters cannot be used to alter the structure of a query, only its values.
The idea of exposing an API endpoint capable of executing arbitrary SQL passed dynamically by callers is generally an anti‑pattern. It introduces significant risks in terms of security (SQL injection, privilege escalation, data exfiltration), operational stability (unbounded queries, large result sets).
