Filter dataset with 'OR' condition without using formula recipe

Options
PANKAJ
PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner

suppose I want to filter the dataset with the condition :

(CITY = 'A' AND DATE <= '31Dec2021') OR (CITY ='B' AND DATE >= '10Jan2019')

for only AND conditions (which I think by default Dataiku accepts if we do all steps in the prepare recipe one after another)

But how can we impose this condition inside the prepare recipe to filter Dataset as per my requirement?

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @PANKAJ
    ,

    I'm a little confused by your question. I find the formula recipes step inside the Prepare recipe to be very helpful. Is there a reason you don't want to use the formula recipes? Are you working with a particular backend that does not currently work that way. Are you consumed about building a new column just to filter? Is this a user education/confidence issue?

    In a formula recipe, you could wrap a bunch of AND and OR functions. Something like this.

    OR(AND(CITY = 'A',DATE <= '31Dec2021'),AND(CITY ='B', DATE >= '10Jan2019'))

    Note the above is just psudo code. You would also have to get those dates sorted out so you could do comparisons.

    This could then be wrapped inside an if statement such that if the above is true then a new column gets the number 1 and if false the new column gets 0. Then you could add a filter step in the prepare recipe to filter on the new column.

    Then you could use a filter step in the prepare recipe to include or remove the rows in question.

    I see at least in DSS V9.0.2 that there is a recipe that combines both the filtering and formula creation into one step.

    Filter Rows with Formula.jpg

    Another option that comes to mind is the filter recipe itself. However to do the mix of AND + OR logic you are wanting to implement here you would have to do this with either SQL or formulas.

    Filter recipie.jpg

    Let us know if this has been helpful, or what specific constraints or limitations you are trying to work out.

  • PANKAJ
    PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner
    Options

    @tgb417

    Actually, I've tried the way you are saying but the results I got are different than my SQL filtered data.

    & is that that condition DATE<='31Dec2019' will work for parsed date column properly or not?

    That's why I wanted to try a different method.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @PANKAJ
    ,

    So is your core question: Why is my SQL implementation of this filtering producing different results than the recipe-based filtering of your data?

    Assuming that you are using exactly the same data source.

    And you are constructing correct filters in both places this is likely only answerable by looking at exactly what the SQL is, and exactly what you are doing in the recipie.

    You also seem to be asking about date formats. All dates in DSS need to be parsed before you can use them in formulas. Below is an example of such a setup.

    Filtering Rows in formula.jpg

    I hope my attempts have been of a little bit of help.

Setup Info
    Tags
      Help me…