Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 20, 2025 3:00PM
Likes: 1
Replies: 6
SELECT DISTINCT
N1.COLUMN1 as "Column 1",
max(N1.COLUMN2) as "Column 2"
FROM DB
GROUP BY N1.COLUMN1 ORDER BY N1.COLUMN1
Hello, I'm trying to reproduce an sql script with Dataiku recipes, and in that script, there is a groupBy with selected columns with no aggregation, how do I reproduce this with Dataiku recipes and PostgreSQL ? The issue is that I don't know what is the default aggregation SQLite used in the case above.
In that case, how to get "N1.COLUMN1" like SQLite would ?
If someone has the answer it would really help !
Hi @cretois!
If I understand your question correctly, I believe the Distinct or Group Recipes could accomplish what you want.
For a group by with selected columns and no aggregation, use the Distinct Recipe:
1. Highlight your dataset in the flow and select the Distinct Recipe from the Action menu
2. Enter your table name and choose your data connection; here you could select a PostgreSQL connection
3. Configure the recipe by choosing which columns you want and then run it
My example output is just a table with the distinct values of "week"
For a group by with aggregations, use the Group Recipe:
1. Repeat steps from above just choose the Group Recipe and select a column in the Group By dropdown
2. Configure the recipe by choosing your desired aggregations and run it
My example output is just a table with the maximum "home_pts" for each "week"
Thank you for the quick response ! The issue is that the GroupBy I want to reproduce select a value WITHOUT aggregation, and some other WITH aggregation, and with Dataiku I don't know how to reproduce the "without" aggregation, I have tried with "FIRST", "LAST", "MIN" and "MAX" (it's a text) and I never get the same results as the SQL script :
SELECT DISTINCT
N1.COLUMN1 as "Column 1",
max(N1.COLUMN2) as "Column 2" FROM DB GROUP BY N1.COLUMN1 ORDER BY N1.COLUMN1
In the example above, there is no aggregation on N1.COLUMN1, and that's what I want to reproduce
I don't really understand what the issue is here. If you use a Group recipe then select to Group By COLUMN1 and select Max of COLUMN2 and it would do what you want. Otherwise post the a screen shot of the Group section of the Group recipe and why you think there is an issue with the output. Here is how my SQL query looks like. The DISTINCT clause in your sample SQL is redundant and not really required. If you group by a column you will get distinct values.
Sorry you're right, it was a wrong example, I'll show you with a more precise one :
SELECT DISTINCT
N1.COLUMN1 as "Column 1",
max(N1.COLUMN2) as "Column 2", N1.COLUMN3 as "Column 3",
FROM DB
GROUP BY N1.COLUMN1
ORDER BY N1.COLUMN1
Forgot that you can select COLUMN1 because it's the Group Key, in that case how do I get COLUMN3 without aggregation ?
This is not a valid SQL (at least on regular SQL engines like Oracle, MS-SQL, MySQL). You must either add a column with an aggregate function or include it in the Group By clause. So either add COLUMN3 as a second group column or add it with an aggregate function. Those are your choices.
Will produce this SQL:
Hi @cretois,
I believe @Turribeach is correct in that what you're asking would be result in invalid SQL.
You can accomplish what you want by using the Group recipe followed by a Join recipe.
1. Use Group to get your aggregations (from your pseudo code, group by "Column1" and calculate max of "Column2")
2. Use Join to combine your aggregations with your non-aggregated columns from the original dataset (join on "Column1" and include "Column2_max" & "Column3")