How to reproduce a groupBy with selected values without aggregations ?

cretois · March 20

SELECT DISTINCT

	N1.COLUMN1 as "Column 1",

	max(N1.COLUMN2) as "Column 2"


FROM DB


GROUP BY N1.COLUMN1
ORDER BY N1.COLUMN1

Hello, I'm trying to reproduce an sql script with Dataiku recipes, and in that script, there is a groupBy with selected columns with no aggregation, how do I reproduce this with Dataiku recipes and PostgreSQL ? The issue is that I don't know what is the default aggregation SQLite used in the case above.

In that case, how to get "N1.COLUMN1" like SQLite would ?

If someone has the answer it would really help !

DarinB · March 21

Hi @cretois!

If I understand your question correctly, I believe the Distinct or Group Recipes could accomplish what you want.

For a group by with selected columns and no aggregation, use the Distinct Recipe:
1. Highlight your dataset in the flow and select the Distinct Recipe from the Action menu
2. Enter your table name and choose your data connection; here you could select a PostgreSQL connection
3. Configure the recipe by choosing which columns you want and then run it

My example output is just a table with the distinct values of "week"

For a group by with aggregations, use the Group Recipe:
1. Repeat steps from above just choose the Group Recipe and select a column in the Group By dropdown
2. Configure the recipe by choosing your desired aggregations and run it

My example output is just a table with the maximum "home_pts" for each "week"

cretois · March 24

https://community.dataiku.com/discussion/comment/45622#Comment_45622

Thank you for the quick response ! The issue is that the GroupBy I want to reproduce select a value WITHOUT aggregation, and some other WITH aggregation, and with Dataiku I don't know how to reproduce the "without" aggregation, I have tried with "FIRST", "LAST", "MIN" and "MAX" (it's a text) and I never get the same results as the SQL script :

SELECT DISTINCT

  N1.COLUMN1 as "Column 1",

  max(N1.COLUMN2) as "Column 2"
FROM DB
GROUP BY N1.COLUMN1
ORDER BY N1.COLUMN1

In the example above, there is no aggregation on N1.COLUMN1, and that's what I want to reproduce

Turribeach · March 24

I don't really understand what the issue is here. If you use a Group recipe then select to Group By COLUMN1 and select Max of COLUMN2 and it would do what you want. Otherwise post the a screen shot of the Group section of the Group recipe and why you think there is an issue with the output. Here is how my SQL query looks like. The DISTINCT clause in your sample SQL is redundant and not really required. If you group by a column you will get distinct values.

cretois · March 24

https://community.dataiku.com/discussion/comment/45650#Comment_45650

Sorry you're right, it was a wrong example, I'll show you with a more precise one :

SELECT DISTINCT  


  N1.COLUMN1 as "Column 1",  

  max(N1.COLUMN2) as "Column 2",
  N1.COLUMN3 as "Column 3",  


FROM DB

GROUP BY N1.COLUMN1

ORDER BY N1.COLUMN1

Forgot that you can select COLUMN1 because it's the Group Key, in that case how do I get COLUMN3 without aggregation ?

Turribeach · March 24

https://community.dataiku.com/discussion/comment/45651#Comment_45651

This is not a valid SQL (at least on regular SQL engines like Oracle, MS-SQL, MySQL). You must either add a column with an aggregate function or include it in the Group By clause. So either add COLUMN3 as a second group column or add it with an aggregate function. Those are your choices.

Will produce this SQL:

DarinB · March 24

Hi @cretois,

I believe @Turribeach is correct in that what you're asking would be result in invalid SQL.

You can accomplish what you want by using the Group recipe followed by a Join recipe.
1. Use Group to get your aggregations (from your pseudo code, group by "Column1" and calculate max of "Column2")
2. Use Join to combine your aggregations with your non-aggregated columns from the original dataset (join on "Column1" and include "Column2_max" & "Column3")

How to reproduce a groupBy with selected values without aggregations ?

Setup Info

Welcome!

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

How to reproduce a groupBy with selected values without aggregations ?

Setup Info

Welcome!

Answers

Welcome!

Welcome!

Quick Links

Categories