Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi everyone,
A newbie question.
When using the Filter recipe is it possible to see both the True and False outputs ? to be used separately in the flow.
Also the same for the Join recipe, being able to see the data that doesnt fit the join ?
Thanks
Mark
Hi @mgirdwood,
For both filter and join recipes, you will have to either create two separate recipes to see the differences and be able to utilize both in your flow, or you would have to modify the single recipe and rebuild the flow. It is required to run those recipes in order to see the data.
If it helps, simple left-joins on relatively small datasets can be performed using a Prepare recipe or visual analysis, which will visually show the output dataset before actually running the recipe. https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/prepare-advanced-joins.html
Also, for filtering you can also use a prepare recipe or just by going to the Explore tab of the dataset and utilizing the interactive filtering there. An example for how to filer dates through the Explore tab and prepare recipe can be found here: https://knowledge.dataiku.com/latest/courses/advanced-data-prep/prepare-recipe/interactive-date-filt...
Hope this helps!
- Elias
I'm not clear that I completely understand your question. If we are missing the point so far. Please feel free to share some more details and even an example can be helpful
However, as @EliasH has pointed out a Left Join gives you all of the values from the first data set (the left set) and only those values from the second data set (the right set) that match. The record from the first set that don't match the second set will be missing values in the columns that would have come from the second set. (So the false for the match items would in effect be blank.)
Note there are some caveats to this approach. If you have multiple values in the second set that match a record in the first set, records in that first dataset might appear in your results more than once. And it is possible to miss values from the second set using a left join because they don't match even one record in the first set.
There are other types of joins out there that can be used:
https://www.w3schools.com/sql/sql_join.asp
Depending on the data you have a full outer join might also be helpful. However, this requires additional technology like a SQL database that supports this feature.
https://community.dataiku.com/t5/Using-Dataiku/Full-outer-join/m-p/732
If a full outer join is what you would like my colleague @Ignacio_Toledo has helped out with another thread in the community that may be interesting to you.
Let us know how you are getting on with your project.
--Tom
Hi Tom
I guess i over complicated the question, i was hoping to have two outputs from the one Filter recipe available directly. the records that matched the criteria of the Filter in one output and the rest in a second output. but i think the only way to do this is to have 2 separate filters with the opposite rules in each.
Similar for the Join, ie in a Left Join output in one list all the matching records based on the Left table and all the remaining records that did not match the join in a second list. Not blank Records in the one list where the data didn't match.
Mark
Thanks for getting back to us. Have you looked at the split recipe? Lots of options there for defining what data goes into which data set. It will produce two or more resulting datasets based on a single dataset input.
Let us know if that is helpful. If not tell us more.
Hi @mgirdwood !
Have you checked this plugin? Apparently it works only with datasets coming from SQL databases, but it does exactly what you need, by creating 3 output datasets.
Hope this helps!
Ignacio
Hi @mgirdwood ,
We've recently released a new version of Dataiku, 11.3, and you can now add additional output datasets to a Join recipe that contains all the unmatched rows from either a left, right, or inner join. You'll be able to see the data that doesn't fit the join easily!
Cheers,
Ashley