Filter and join multiple outputs
Hi everyone,
A newbie question.
When using the Filter recipe is it possible to see both the True and False outputs ? to be used separately in the flow.
Also the same for the Join recipe, being able to see the data that doesnt fit the join ?
Thanks
Mark
Answers
-
Hi @mgirdwood
,
For both filter and join recipes, you will have to either create two separate recipes to see the differences and be able to utilize both in your flow, or you would have to modify the single recipe and rebuild the flow. It is required to run those recipes in order to see the data.
If it helps, simple left-joins on relatively small datasets can be performed using a Prepare recipe or visual analysis, which will visually show the output dataset before actually running the recipe. https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/prepare-advanced-joins.html
Also, for filtering you can also use a prepare recipe or just by going to the Explore tab of the dataset and utilizing the interactive filtering there. An example for how to filer dates through the Explore tab and prepare recipe can be found here: https://knowledge.dataiku.com/latest/courses/advanced-data-prep/prepare-recipe/interactive-date-filter.html
Hope this helps!
- Elias -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
I'm not clear that I completely understand your question. If we are missing the point so far. Please feel free to share some more details and even an example can be helpful
However, as @EliasH
has pointed out a Left Join gives you all of the values from the first data set (the left set) and only those values from the second data set (the right set) that match. The record from the first set that don't match the second set will be missing values in the columns that would have come from the second set. (So the false for the match items would in effect be blank.)Note there are some caveats to this approach. If you have multiple values in the second set that match a record in the first set, records in that first dataset might appear in your results more than once. And it is possible to miss values from the second set using a left join because they don't match even one record in the first set.
There are other types of joins out there that can be used:
https://www.w3schools.com/sql/sql_join.asp
Depending on the data you have a full outer join might also be helpful. However, this requires additional technology like a SQL database that supports this feature.
https://community.dataiku.com/t5/Using-Dataiku/Full-outer-join/m-p/732
If a full outer join is what you would like my colleague @Ignacio_Toledo
has helped out with another thread in the community that may be interesting to you.Let us know how you are getting on with your project.
--Tom
-
Hi Tom
I guess i over complicated the question, i was hoping to have two outputs from the one Filter recipe available directly. the records that matched the criteria of the Filter in one output and the rest in a second output. but i think the only way to do this is to have 2 separate filters with the opposite rules in each.
Similar for the Join, ie in a Left Join output in one list all the matching records based on the Left table and all the remaining records that did not match the join in a second list. Not blank Records in the one list where the data didn't match.
Mark
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Thanks for getting back to us. Have you looked at the split recipe? Lots of options there for defining what data goes into which data set. It will produce two or more resulting datasets based on a single dataset input.
Let us know if that is helpful. If not tell us more.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @mgirdwood
!Have you checked this plugin? Apparently it works only with datasets coming from SQL databases, but it does exactly what you need, by creating 3 output datasets.
Hope this helps!
Ignacio
-
Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker
Hi @mgirdwood
,We've recently released a new version of Dataiku, 11.3, and you can now add additional output datasets to a Join recipe that contains all the unmatched rows from either a left, right, or inner join. You'll be able to see the data that doesn't fit the join easily!
Cheers,
Ashley