ORDER BY in SQL recipe does nothing, yet it works in SQL notebook

My team uses Dataiku to write flows consisting mostly of SQL recipes with data stored on an Amazon Redshift server. I have noticed the odd behavior that ORDER BY clauses in SQL recipes do not seem to have any effect on the resulting Dataiku dataset. They do, however, still work inside SQL notebooks, which suggests that it's not a matter of my writing the ORDER BY clause incorrectly.
As a reproducible example, I uploaded the R demo dataset 'mtcars' to Dataiku and synced it to our Redshift server as a Redshift table. I run the following query:
SELECT gear, am, vs, cyl, mpg, wt
FROM "FOO_BAR_mtcars_rs"
ORDER BY gear, am, vs, cyl, mpg
Within a SQL notebook, this creates the anticipated ordering.
Within the flow, however, the resulting Redshift table does not have the anticipated ordering. In column "gear", values of 3 and 4 are interleaved, rather than all the 3s appearing, then the 4s. Within values of gear, columns am, vs, cyl, and mpg don't appear to have any meaningful ordering either.
Can anybody reproduce and explain this issue?
Operating system used: Windows
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,407 Neuron
This is by design. The Dataset Explore tab allows you to sort the data but this doesn't work on the real data but a sample. The bigger question is why do you need the data sorted? Sorting is only usually needed needed for display purposes. Most recipes don't support sorting and SQL databases can't not guarantee the ordering of the data returned unless the order by clause is used.
-
I would like the data sorted because I then export it as a CSV or Excel file for an automated email report. It is inelegant for me or the users to have to download the report and sort it within Excel. Is there a different way I should be preparing a table for reporting via email attachment?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,407 Neuron
My preferred solution for that use case will be a Python recipe which will write the CSV or Excel file to a managed folder. Then I can send that in a Mail Reporter or Send Mail step. Are you looking for a code or no-code solution? Personally I much prefer the code solution here since I can customise the Excel file with a lot more features than the plain export you get from Dataiku.
-
I will use an R recipe to sort the data and save it to a managed folder, then. Thanks for explaining that the SQL is behaving as intended, however unexpected it was to me.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,407 Neuron
There is a Sort recipe as well but it's a bit useless unless output dataset technology supports storing / retrieving data ordered.