Select output columns in a recipe
Hello,
I would like to know if there are limitations on recipes when we want to select output columns. I know I can do it with prepare recipes or join recipes; I saw that on split recipes, I could see the SQL query that will be performed, I can enter the box and do changes, but they are not taken into account.
Here is my configuration :
I have a dataset A with a field <field1>. <field1> can contain null values. So I use a split recipe to isolate them. A1 will contain the null values of <field1> and A2 will contain the others.
On A2, I create a join with a dataset B to get additional information from B. Let's call the result A3.
On A3, I create another left join on a dataset C, to get specific information on C. Let's call the result A4.
I need to stack the results from A3 and A4. In order to get the same structure on both, here is what I did :
- I create a SQL recipe on A3 to choose my output columns in the desired order. Let's call the result A5.
- I create a SQL recipe on A4 to choose my output columns in the desired order. Let's call the result A6
Then I can do my stack recipe between A5 and A6 to produce the final result.
So my questions are the following :
- Is there an easier way to produce the same result ?
- Are there some limitations on recipes to select the output columns (and the order of them), and if yes, on which recipes do we have restrictions ?
I thank you in advance and wish you a good day.
Best regards,
Jean-Luc.
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
Hi @jlbellier
,It seems like you could do a multi-table join for the following datasets (A2, B, C):
On A2, I create a join with a dataset B to get additional information from B. Let's call the result A3.
On A3, I create another left join on a dataset C, to get specific information on C. Let's call the result A4.To add additional tables to your join recipe, select the + button on the table you would like to join a dataset with.
Then, instead of the stack recipe, you can use a prepare recipe to order the columns and produce the final output.
You mentioned that you could not order your columns using the split recipe, I assume you are referring to the "computed columns" or "pre-filter" where you can add a sql expression. I would not recommend trying to alter the order of your columns here. My suggestion would be to either use a prepare recipe, sql recipe or python recipe.
Let me know if you have any questions.
Thanks!
Jordan