Scenario SQL result limited to 10

MRvLuijpen
MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron
edited July 16 in Using Dataiku

Hello Community,

Small question: If I try to run some SQL code inside a scenario

SELECT Order_id, COUNT(1)
FROM mySQL
GROUP BY Order_id
ORDER BY Order_id

the number of result seems always to be limited to 10 (as can be seen in the attached log, lines 16 ).

It seems that instead of the code above the following code seems to be executed

Initialize statement for 'select top(10)  Order_id, COUNT(1)
FROM MySQL
GROUP BY Order_id
ORDER BY Order_id' (maxRows=10)

What do I need to change in order to get the correct results (17 distinct values).

Thanks in advance

Marc Robert

Best Answer

  • PierreB
    PierreB Dataiker Posts: 6 Dataiker
    Answer ✓

    Hello,

    The SQL step of a scenario was made to retrieve data so they can be used to determine next scenario steps.

    It is not intended to retrieve and log a full dataset.

    If you want to extract a subpart of your table from a query, you can create a new SQL dataset from a query or you can use an SQL query recipe.
    You can also create an SQL notebook if you just want to explore your database.

    Hope this helps!

Answers

  • MRvLuijpen
    MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron

    Hello Pierre,

    Thank you for your quick responds.

  • MRvLuijpen
    MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron

    Hello Community.

    Is it possible to run a SQL/Python notebook from inside a scenario (and thus not having the limitation of 10 records?)

    Thanks in advance

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron

    Hello @MRvLuijpen
    ,

    Yes, you can execute a SQL statement from Python within an Execute Python step in a Scenario. Check out the deck here for examples of interacting with SQL through Python: https://community.dataiku.com/t5/Online-Events/Using-the-Dataiku-DSS-Python-API-for-Interfacing-with-SQL/ba-p/7797

    You should be able to use most of methods described in a Scenario.

    What are you trying to accomplish?

    Marlan

  • reneege
    reneege Registered Posts: 23 ✭✭✭✭

    Question on the same line. I would like to send a SQL delete statement on a table to one of our connected database. But when tries in Python code, it doesn't do anything to the table. SQL select statement works though.

    If I use SQL query in scenario with delete statement, it will only delete 10 rows at a time.

    Any info would be really appreciated!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @reneege
    ,

    From the details you have provided, I'm not clear if I can be of any help. However, It may be helpful to others in the community to understand a few additional things to help out:

    • What database are you using?
      • What type of Connection have you set up in the Dataiku Connection?
    • In what context are you running your Python? (Notebook, Visual Recipe, Completely external to DSS)
    • And what version of DSS you are using.
  • reneege
    reneege Registered Posts: 23 ✭✭✭✭

    Hi The database is a MS-SQL. The connection used is one of SQL connections we having been used successfully (in Select statements, or write recipe output datasets to).

    The test was done by SQL inside Python code both in Visual Recipe or in Notebook inside DSS.

    Just you know that I got it worked around in Python to get read dataset from the table, filtered out data in dataframe and write output dataset in the same table in the database.

    I feel there may be some glitch/bug in DSS that using Delete SQL statement in Python is not working.

    We are on DSS 8.0.2.

  • allan
    allan Registered Posts: 13 ✭✭✭✭

    Can you please have someone update the documentation to reflect this?

    It currently says on https://doc.dataiku.com/dss/latest/scenarios/steps.html

    "Execute SQL

    This step executes one or more SQL statements on a DSS connection. Both straight SQL connections (ex: a Postgresql connection) and HiveQL connections (Hive and Impala) can be used.

    The output of the query, if there is one, is available to subsequent steps as variables (See Variables in scenarios)."

    "The output of the query, if there is one" would suggest that this is suitable for doing things other than select. I've just wasted a bunch of time trying to figure out why a simple insert into a table was only inserting 10 rows in a scenario SQL Step as opposed to the correct amount; thankfully, google led me here. If you're looking to cause frustration for your customers, this is an excellent way to go about it!

  • Pavle
    Pavle Registered Posts: 2

    This should really be documented. The method name and documentation never imply that the results are limited to 10 rows.

Setup Info
    Tags
      Help me…