Scenario SQL result limited to 10
Hello Community,
Small question: If I try to run some SQL code inside a scenario
SELECT Order_id, COUNT(1) FROM mySQL GROUP BY Order_id ORDER BY Order_id
the number of result seems always to be limited to 10 (as can be seen in the attached log, lines 16 ).
It seems that instead of the code above the following code seems to be executed
Initialize statement for 'select top(10) Order_id, COUNT(1) FROM MySQL GROUP BY Order_id ORDER BY Order_id' (maxRows=10)
What do I need to change in order to get the correct results (17 distinct values).
Thanks in advance
Marc Robert
Best Answer
-
Hello,
The SQL step of a scenario was made to retrieve data so they can be used to determine next scenario steps.
It is not intended to retrieve and log a full dataset.
If you want to extract a subpart of your table from a query, you can create a new SQL dataset from a query or you can use an SQL query recipe.
You can also create an SQL notebook if you just want to explore your database.Hope this helps!
Answers
-
MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron
Hello Pierre,
Thank you for your quick responds.
-
MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron
Hello Community.
Is it possible to run a SQL/Python notebook from inside a scenario (and thus not having the limitation of 10 records?)
Thanks in advance
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron
Hello @MRvLuijpen
,Yes, you can execute a SQL statement from Python within an Execute Python step in a Scenario. Check out the deck here for examples of interacting with SQL through Python: https://community.dataiku.com/t5/Online-Events/Using-the-Dataiku-DSS-Python-API-for-Interfacing-with-SQL/ba-p/7797
You should be able to use most of methods described in a Scenario.
What are you trying to accomplish?
Marlan
-
Question on the same line. I would like to send a SQL delete statement on a table to one of our connected database. But when tries in Python code, it doesn't do anything to the table. SQL select statement works though.
If I use SQL query in scenario with delete statement, it will only delete 10 rows at a time.
Any info would be really appreciated!
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@reneege
,From the details you have provided, I'm not clear if I can be of any help. However, It may be helpful to others in the community to understand a few additional things to help out:
- What database are you using?
- What type of Connection have you set up in the Dataiku Connection?
- In what context are you running your Python? (Notebook, Visual Recipe, Completely external to DSS)
- And what version of DSS you are using.
- What database are you using?
-
Hi The database is a MS-SQL. The connection used is one of SQL connections we having been used successfully (in Select statements, or write recipe output datasets to).
The test was done by SQL inside Python code both in Visual Recipe or in Notebook inside DSS.
Just you know that I got it worked around in Python to get read dataset from the table, filtered out data in dataframe and write output dataset in the same table in the database.
I feel there may be some glitch/bug in DSS that using Delete SQL statement in Python is not working.
We are on DSS 8.0.2.
-
Can you please have someone update the documentation to reflect this?
It currently says on https://doc.dataiku.com/dss/latest/scenarios/steps.html
"Execute SQL
This step executes one or more SQL statements on a DSS connection. Both straight SQL connections (ex: a Postgresql connection) and HiveQL connections (Hive and Impala) can be used.
The output of the query, if there is one, is available to subsequent steps as variables (See Variables in scenarios)."
"The output of the query, if there is one" would suggest that this is suitable for doing things other than select. I've just wasted a bunch of time trying to figure out why a simple insert into a table was only inserting 10 rows in a scenario SQL Step as opposed to the correct amount; thankfully, google led me here. If you're looking to cause frustration for your customers, this is an excellent way to go about it!
-
This should really be documented. The method name and documentation never imply that the results are limited to 10 rows.