Keep rows with list of values in my column

aotearoanz
aotearoanz Dataiku DSS Core Concepts, Registered Posts: 5 ✭✭✭✭

Hello,

I want to keep rows based on the list of values in my column. I have a list of 100 values that I want to check in my column. If the row satisfies one of the 100 values then I want to keep that row. I try using the prepare recipe using the filter data processor with the filter rows/cell values. I don't want to enter 100 values on this processor. Is there a better way to provide the list in Dataiku?

Thanks for your help.

aotearoanz

Answers

  • Vitaliy
    Vitaliy Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker

    Hello,

    You can still use the filter processor, but instead of adding 100 values, you can add one regular expression and use the "Regular expression" match mode. The regular expression can be a pattern (if it can be defined based on the values you need to keep) or a simple list of all values separated by a pipe. Please find an example in the screenshot below.

    Screenshot 2021-04-20 at 23.48.34.png

    I hope this helps.
    Regards,
    Vitaliy

  • aotearoanz
    aotearoanz Dataiku DSS Core Concepts, Registered Posts: 5 ✭✭✭✭

    hi @VitaliyD
    ,

    Thank you for your response, I tried your suggestion however I am getting an error stating that

    "Not supported settings for 'FilterOnValue' processor. Cannot use SQL engine : PATTERN string matching is not transferable.

    I am running this HIVE engine if that helps

    Please find the attached screen shot

    Thanks

    aotearoanz

  • Ashley
    Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 163 Dataiker

    Hi @aotearoanz
    ,

    Another trick you might try is using a join to filter.

    This will work if your list of 100 values exists as a single column w/ 100 rows (you could create an editable dataset or upload a CSV, for example). You can then inner join that data set with the one you're trying to filter; this will filter out all of the rows that didn't get a match between both datasets.

    This will be easier to maintain than a string of 100 values separated by a pipe in case the values you want to filter with change over time, but might not work depending on what you're trying to match. If you're looking to match a substring, you could also try doing this with the Fuzzy Join recipe (new in v9). Let us know how it goes!

    Best,

    Ashley

  • aotearoanz
    aotearoanz Dataiku DSS Core Concepts, Registered Posts: 5 ✭✭✭✭

    Hi @AshleyW
    ,

    That's a great suggestion! I will keep this on one my tips and tricks for Dataiku.

    Thank you for your time and have a great day!

    Cheers

    aotearoanz

Setup Info
    Tags
      Help me…