ORDER BY in SQL recipe does nothing, yet it works in SQL notebook

Thrillgard
Thrillgard Registered Posts: 3 ✭✭✭✭

My team uses Dataiku to write flows consisting mostly of SQL recipes with data stored on an Amazon Redshift server. I have noticed the odd behavior that ORDER BY clauses in SQL recipes do not seem to have any effect on the resulting Dataiku dataset. They do, however, still work inside SQL notebooks, which suggests that it's not a matter of my writing the ORDER BY clause incorrectly.

As a reproducible example, I uploaded the R demo dataset 'mtcars' to Dataiku and synced it to our Redshift server as a Redshift table. I run the following query:

SELECT gear, am, vs, cyl, mpg, wt
FROM "FOO_BAR_mtcars_rs"
ORDER BY gear, am, vs, cyl, mpg

Within a SQL notebook, this creates the anticipated ordering.

image.png

Within the flow, however, the resulting Redshift table does not have the anticipated ordering. In column "gear", values of 3 and 4 are interleaved, rather than all the 3s appearing, then the 4s. Within values of gear, columns am, vs, cyl, and mpg don't appear to have any meaningful ordering either.

image.png

Can anybody reproduce and explain this issue?

Operating system used: Windows

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron

    This is by design. The Dataset Explore tab allows you to sort the data but this doesn't work on the real data but a sample. The bigger question is why do you need the data sorted? Sorting is only usually needed needed for display purposes. Most recipes don't support sorting and SQL databases can't not guarantee the ordering of the data returned unless the order by clause is used.

  • Thrillgard
    Thrillgard Registered Posts: 3 ✭✭✭✭

    I would like the data sorted because I then export it as a CSV or Excel file for an automated email report. It is inelegant for me or the users to have to download the report and sort it within Excel. Is there a different way I should be preparing a table for reporting via email attachment?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron

    My preferred solution for that use case will be a Python recipe which will write the CSV or Excel file to a managed folder. Then I can send that in a Mail Reporter or Send Mail step. Are you looking for a code or no-code solution? Personally I much prefer the code solution here since I can customise the Excel file with a lot more features than the plain export you get from Dataiku.

  • Thrillgard
    Thrillgard Registered Posts: 3 ✭✭✭✭

    I will use an R recipe to sort the data and save it to a managed folder, then. Thanks for explaining that the SQL is behaving as intended, however unexpected it was to me.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron

    There is a Sort recipe as well but it's a bit useless unless output dataset technology supports storing / retrieving data ordered.

Setup Info
    Tags
      Help me…