Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Select output columns in a recipe

jlbellier
Level 3
Select output columns in a recipe

Hello,

I would like to know if there are limitations on recipes when we want to select output columns. I know I can do it with prepare recipes or join recipes; I saw that on split recipes, I could see the SQL query that will be performed, I can enter the box and do changes, but they are not taken into account. 

Here is my configuration :

I have a dataset A with a field <field1>. <field1> can contain null values. So I use a split recipe to isolate them. A1 will contain the null values of <field1> and A2 will contain the others. 

On A2, I create a join with a dataset B to get additional information from B. Let's call the result A3.
On A3, I create another left join on a dataset C, to get specific information on C. Let's call the result A4. 

I need to stack the results from A3 and A4. In order to get the same structure on both, here is what I did :

  • I create a SQL recipe on A3 to choose my output columns in the desired order. Let's call the result A5.
  • I create a SQL recipe on A4 to choose my output columns in the desired order. Let's call the result A6

Then I can do my stack recipe between A5 and A6 to produce the final result. 

So my questions are the following :

  • Is there an easier way to produce the same result ? 
  • Are there some limitations on recipes to select the output columns (and the order of them), and if yes, on which recipes do we have restrictions ?

I thank you in advance and wish you a good day. 

Best regards,

Jean-Luc.

0 Kudos
1 Reply
JordanB
Dataiker

Hi @jlbellier,

It seems like you could do a multi-table join for the following datasets (A2, B, C):

On A2, I create a join with a dataset B to get additional information from B. Let's call the result A3.
On A3, I create another left join on a dataset C, to get specific information on C. Let's call the result A4. 

To add additional tables to your join recipe, select the + button on the table you would like to join a dataset with.

Screenshot 2023-08-18 at 12.28.56 PM.png

Then, instead of the stack recipe, you can use a prepare recipe to order the columns and produce the final output.

You mentioned that you could not order your columns using the split recipe, I assume you are referring to the "computed columns" or "pre-filter" where you can add a sql expression. I would not recommend trying to alter the order of your columns here. My suggestion would be to either use a prepare recipe, sql recipe or python recipe. 

Let me know if you have any questions.

Thanks!

Jordan

 

0 Kudos