Select columns in Split and Filter recipe

luong_bayard · ‎10-26-2023

Hi,

There are many times when I prepared the dataset in the desired format and then I need to split it to 2 or more datasets based on 1 column which is not needed in the last output.

I then have to add 1 more recipe for each splitted dataset in order to remove that column so that I can have the desired schema which is a big waste.

For the filter recipe, sometimes we just need a smaller version of a dataset, vertically and horizontally, but the filter recipe only allow us to filter on rows and not select column.

It will be nice to have the capacity to select desired column in Split and Filter recipes like what we have in TopN, join, etc.

Turribeach · ‎10-26-2023

Totally understand where this comes from but you could imagine that if you start adding options to visual recipes you end up with a really complicated visual recipe. The trade off you pay for using visiaul recipes is that you will need to perform most actions in separate recipes. If you want to do all at once or reduce the amount of recipes you should use a code recipe like Python.

luong_bayard · ‎10-27-2023

Hi@Turribeach , thanks a lot for your reply.

I understand what you say and that is why I select between a lot of improvements in my head and choose 2 that I find have been impacting a lot of my team's everyday work.

We don't have Spark so using Python means using the internal memory of our machine which is quite costly and not every users in my company are comfortable with coding (which is why the visual recipes was a big selling point when my company switch to DSS).

The option to remove the split on column is quite basic and I think it is really necessary in a lot of case scenario. And the danger of not having the capacity to select column is that it can cause a failed scenario when the schema of the input table changes and when we are not alone to use the same input table, the changes request by others shoudn't impact everyone just besause we bring all of the columns everytime.

Select columns in Split and Filter recipe

Labels

Data Exploration and Preparation

Designer Experience

Consistent display of chart title when hover on chart tab

I want to use Dataiku in Japanese.

Programmatic Git Support (Shell, Python API or Both)

Method to re-order V12 Visual ML override rules