I want to keep rows based on the list of values in my column. I have a list of 100 values that I want to check in my column. If the row satisfies one of the 100 values then I want to keep that row. I try using the prepare recipe using the filter data processor with the filter rows/cell values. I don't want to enter 100 values on this processor. Is there a better way to provide the list in Dataiku?
Thanks for your help.
You can still use the filter processor, but instead of adding 100 values, you can add one regular expression and use the "Regular expression" match mode. The regular expression can be a pattern (if it can be defined based on the values you need to keep) or a simple list of all values separated by a pipe. Please find an example in the screenshot below.
I hope this helps.
hi @VitaliyD ,
Thank you for your response, I tried your suggestion however I am getting an error stating that
"Not supported settings for 'FilterOnValue' processor. Cannot use SQL engine : PATTERN string matching is not transferable.
I am running this HIVE engine if that helps
Please find the attached screen shot
Hi @aotearoanz ,
Another trick you might try is using a join to filter.
This will work if your list of 100 values exists as a single column w/ 100 rows (you could create an editable dataset or upload a CSV, for example). You can then inner join that data set with the one you're trying to filter; this will filter out all of the rows that didn't get a match between both datasets.
This will be easier to maintain than a string of 100 values separated by a pipe in case the values you want to filter with change over time, but might not work depending on what you're trying to match. If you're looking to match a substring, you could also try doing this with the Fuzzy Join recipe (new in v9). Let us know how it goes!