Scenario SQL result limited to 10

Solved!
MRvLuijpen
Scenario SQL result limited to 10

Hello Community, 

Small question: If I try to run some SQL code inside a scenario

SELECT Order_id, COUNT(1)
FROM mySQL
GROUP BY Order_id
ORDER BY Order_id

the number of result seems always to be limited to 10 (as can be seen in the attached log, lines 16 ).

It seems that instead of the code above the following code seems to be executed

Initialize statement for 'select top(10)  Order_id, COUNT(1)
FROM MySQL
GROUP BY Order_id
ORDER BY Order_id' (maxRows=10)

What do I need to change in order to get the correct results (17 distinct values).

Thanks in advance

Marc Robert

0 Kudos
1 Solution
PierreB
Dataiker

Hello,

The SQL step of a scenario was made to retrieve data so they can be used to determine next scenario steps.

It is not intended to retrieve and log a full dataset.

If you want to extract a subpart of your table from a query, you can create a new SQL dataset from a query or you can use an SQL query recipe.
You can also create an SQL notebook if you just want to explore your database.

Hope this helps!

 

View solution in original post

9 Replies
PierreB
Dataiker

Hello,

The SQL step of a scenario was made to retrieve data so they can be used to determine next scenario steps.

It is not intended to retrieve and log a full dataset.

If you want to extract a subpart of your table from a query, you can create a new SQL dataset from a query or you can use an SQL query recipe.
You can also create an SQL notebook if you just want to explore your database.

Hope this helps!

 

MRvLuijpen
Author

Hello Pierre, 

Thank you for your quick responds.

0 Kudos
allan
Level 3

Can you please have someone update the documentation to reflect this?

It currently says on https://doc.dataiku.com/dss/latest/scenarios/steps.html

"Execute SQL

This step executes one or more SQL statements on a DSS connection. Both straight SQL connections (ex: a Postgresql connection) and HiveQL connections (Hive and Impala) can be used.

The output of the query, if there is one, is available to subsequent steps as variables (See Variables in scenarios)."

"The output of the query, if there is one" would suggest that this is suitable for doing things other than select. I've just wasted a bunch of time trying to figure out why a simple insert into a table was only inserting 10 rows in a scenario SQL Step as opposed to the correct amount; thankfully, google led me here. If you're looking to cause frustration for your customers, this is an excellent way to go about it!

 

Pavle
Level 1

This should really be documented. The method name and documentation never imply that the results are limited to 10 rows.

0 Kudos
MRvLuijpen
Author

Hello Community.

Is it possible to run a SQL/Python notebook from inside a scenario (and thus not having the limitation of 10 records?)

Thanks in advance

 

0 Kudos
Marlan

Hello @MRvLuijpen,

Yes, you can execute a SQL statement from Python within an Execute Python step in a Scenario. Check out the deck here for examples of interacting with SQL through Python: https://community.dataiku.com/t5/Online-Events/Using-the-Dataiku-DSS-Python-API-for-Interfacing-with...

You should be able to use most of methods described in a Scenario.

What are you trying to accomplish?

Marlan

reneege
Level 3

Question on the same line. I would like to send a SQL delete statement on a table to one of our connected database. But when tries in Python code, it doesn't do anything to the table. SQL select statement works though.

If I use SQL query in scenario with delete statement, it will only delete 10 rows at a time. 

Any info would be really appreciated!

 

 

0 Kudos
tgb417

@reneege ,

From the details you have provided, I'm not clear if I can be of any help.  However, It may be helpful to others in the community to understand a few additional things to help out:

  • What database are you using?
    • What type of Connection have you set up in the Dataiku Connection?
  • In what context are you running your Python? (Notebook, Visual Recipe, Completely external to DSS)
  • And what version of DSS you are using.
--Tom
0 Kudos
reneege
Level 3

Hi The database is a MS-SQL. The connection used is one of  SQL connections we having been used successfully (in Select statements, or write recipe output datasets to).

The test was done by SQL inside Python code both in Visual Recipe or in Notebook inside DSS.

Just you know that I got it worked around in Python to get read dataset from the table, filtered out data in dataframe and write output dataset in the same table in the database.

I feel there may be some glitch/bug in DSS that using Delete SQL statement in Python is not working.

We are on DSS 8.0.2.