Select output columns in a recipe

jlbellier
jlbellier Registered Posts: 22

Hello,

I would like to know if there are limitations on recipes when we want to select output columns. I know I can do it with prepare recipes or join recipes; I saw that on split recipes, I could see the SQL query that will be performed, I can enter the box and do changes, but they are not taken into account.

Here is my configuration :

I have a dataset A with a field <field1>. <field1> can contain null values. So I use a split recipe to isolate them. A1 will contain the null values of <field1> and A2 will contain the others.

On A2, I create a join with a dataset B to get additional information from B. Let's call the result A3.
On A3, I create another left join on a dataset C, to get specific information on C. Let's call the result A4.

I need to stack the results from A3 and A4. In order to get the same structure on both, here is what I did :

  • I create a SQL recipe on A3 to choose my output columns in the desired order. Let's call the result A5.
  • I create a SQL recipe on A4 to choose my output columns in the desired order. Let's call the result A6

Then I can do my stack recipe between A5 and A6 to produce the final result.

So my questions are the following :

  • Is there an easier way to produce the same result ?
  • Are there some limitations on recipes to select the output columns (and the order of them), and if yes, on which recipes do we have restrictions ?

I thank you in advance and wish you a good day.

Best regards,

Jean-Luc.

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker

    Hi @jlbellier
    ,

    It seems like you could do a multi-table join for the following datasets (A2, B, C):

    On A2, I create a join with a dataset B to get additional information from B. Let's call the result A3.
    On A3, I create another left join on a dataset C, to get specific information on C. Let's call the result A4.

    To add additional tables to your join recipe, select the + button on the table you would like to join a dataset with.

    Screenshot 2023-08-18 at 12.28.56 PM.png

    Then, instead of the stack recipe, you can use a prepare recipe to order the columns and produce the final output.

    You mentioned that you could not order your columns using the split recipe, I assume you are referring to the "computed columns" or "pre-filter" where you can add a sql expression. I would not recommend trying to alter the order of your columns here. My suggestion would be to either use a prepare recipe, sql recipe or python recipe.

    Let me know if you have any questions.

    Thanks!

    Jordan

Setup Info
    Tags
      Help me…