The Dataiku Frontrunner Awards have launched to recognize your achievements! SUBMIT YOUR ENTRY

Keep rows with list of values in my column

aotearoanz
Level 1
Keep rows with list of values in my column

Hello,

I want to keep rows based on the list of values in my column. I have a list of 100 values that I want to check in my column. If the row satisfies one of the 100 values then I want to keep that row. I try using the prepare recipe using the filter data processor with the filter rows/cell values. I don't want to enter 100 values on this processor. Is there a better way to provide the list in Dataiku?

 

Thanks for your help.

 

aotearoanz

0 Kudos
4 Replies
VitaliyD
Dataiker
Dataiker

Hello,

You can still use the filter processor, but instead of adding 100 values, you can add one regular expression and use the "Regular expression" match mode. The regular expression can be a pattern (if it can be defined based on the values you need to keep) or a simple list of all values separated by a pipe. Please find an example in the screenshot below.

Screenshot 2021-04-20 at 23.48.34.png

I hope this helps.
Regards,
Vitaliy

0 Kudos
aotearoanz
Level 1
Author

hi @VitaliyD ,

Thank you for your response, I tried your suggestion however I am getting an error stating that 

"Not supported settings for 'FilterOnValue' processor. Cannot use SQL engine : PATTERN string matching is not transferable.

I am running this HIVE engine if that helps

Please find the attached screen shot

Thanks

 

aotearoanz

 

0 Kudos
AshleyW
Dataiker
Dataiker

Hi @aotearoanz ,

Another trick you might try is using a join to filter.

This will work if your list of 100 values exists as a single column w/ 100 rows (you could create an editable dataset or upload a CSV, for example). You can then inner join that data set with the one you're trying to filter; this will filter out all of the rows that didn't get a match between both datasets.

This will be easier to maintain than a string of 100 values separated by a pipe in case the values you want to filter with change over time, but might not work depending on what you're trying to match. If you're looking to match a substring, you could also try doing this with the Fuzzy Join recipe (new in v9). Let us know how it goes!

Best,

Ashley

 

aotearoanz
Level 1
Author

Hi @AshleyW ,

That's a great suggestion!  I will keep this on one my tips and tricks for Dataiku.

Thank you for your time and have a great day! 

Cheers

aotearoanz

 

 

0 Kudos
A banner prompting to get Dataiku DSS
Public