Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

filter data with multiple columns with a specific value for each

Solved!
sunith992
Level 3
filter data with multiple columns with a specific value for each

Hi,

from the formula language in the prepare recipe, i am looking to filter the data with multiple columns with a specific value or in a list for each column.

Ex : column 'A' has 5 values, out of 5, 3 values are to be matched with other column 'B' with a value 'X' only, so remaining 2 values from col 'A' can be matched with all values in Col 'B'.

(as i am not able to find a solution to filter a column with certain list of values ) 

 

0 Kudos
1 Solution
MiguelangelC
Dataiker

Hi,

You can use the formula language to filter rows based on a list of values using array functions. For example, see Capture.png where rows are filtered based on whether the 'pages_visited' values falls within a provided array.

You can further complicate things by using boolean operators and add conditions for other columns.

Information about formula functions can be found in this article: https://doc.dataiku.com/dss/11.0/formula/index.html#array-functions

View solution in original post

3 Replies
MiguelangelC
Dataiker

Hi,

You can use the formula language to filter rows based on a list of values using array functions. For example, see Capture.png where rows are filtered based on whether the 'pages_visited' values falls within a provided array.

You can further complicate things by using boolean operators and add conditions for other columns.

Information about formula functions can be found in this article: https://doc.dataiku.com/dss/11.0/formula/index.html#array-functions

sunith992
Level 3
Author

Hi

Thanks for the response, i have already visited the above page for ArrayContains, unfortunately it didn't work; assumed that the 'item' is only meant to pass a single number/string and cannot be a column as the below details are not mentioned clearly, also it wasn't demonstrated with an example considering the 2nd parameter as a variable.

will try out the way you had written in screenshot. Thanks again .

 

arrayContains(array a, item) boolean

Returns whether the array a contains item

arrayContains([1, 2, 3], 5) returns false

0 Kudos
AshleyW
Dataiker

@sunith992 ,

Might using a join between two datasets as a means to filter the first dataset do the trick? From your original post, it's not clear what you're trying to have filter what, but I've found that using a Join can be quite handy when I want to filter a dataset by a list of values.

Ashley