Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
suppose I want to filter the dataset with the condition :
(CITY = 'A' AND DATE <= '31Dec2021') OR (CITY ='B' AND DATE >= '10Jan2019')
for only AND conditions (which I think by default Dataiku accepts if we do all steps in the prepare recipe one after another)
But how can we impose this condition inside the prepare recipe to filter Dataset as per my requirement?
I'm a little confused by your question. I find the formula recipes step inside the Prepare recipe to be very helpful. Is there a reason you don't want to use the formula recipes? Are you working with a particular backend that does not currently work that way. Are you consumed about building a new column just to filter? Is this a user education/confidence issue?
In a formula recipe, you could wrap a bunch of AND and OR functions. Something like this.
OR(AND(CITY = 'A',DATE <= '31Dec2021'),AND(CITY ='B', DATE >= '10Jan2019'))
Note the above is just psudo code. You would also have to get those dates sorted out so you could do comparisons.
This could then be wrapped inside an if statement such that if the above is true then a new column gets the number 1 and if false the new column gets 0. Then you could add a filter step in the prepare recipe to filter on the new column.
Then you could use a filter step in the prepare recipe to include or remove the rows in question.
I see at least in DSS V9.0.2 that there is a recipe that combines both the filtering and formula creation into one step.
Another option that comes to mind is the filter recipe itself. However to do the mix of AND + OR logic you are wanting to implement here you would have to do this with either SQL or formulas.
Let us know if this has been helpful, or what specific constraints or limitations you are trying to work out.
Actually, I've tried the way you are saying but the results I got are different than my SQL filtered data.
& is that that condition DATE<='31Dec2019' will work for parsed date column properly or not?
That's why I wanted to try a different method.
So is your core question: Why is my SQL implementation of this filtering producing different results than the recipe-based filtering of your data?
Assuming that you are using exactly the same data source.
And you are constructing correct filters in both places this is likely only answerable by looking at exactly what the SQL is, and exactly what you are doing in the recipie.
You also seem to be asking about date formats. All dates in DSS need to be parsed before you can use them in formulas. Below is an example of such a setup.
I hope my attempts have been of a little bit of help.