Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Filter and join multiple outputs

mgirdwood
Level 2
Filter and join multiple outputs

Hi everyone,

A newbie question.

When using the Filter recipe is it possible to see both the True and False outputs ? to be used separately in the flow.

Also the same for the Join recipe, being able to see the data that doesnt fit the join ?

 

Thanks

Mark

0 Kudos
5 Replies
EliasH
Dataiker
Dataiker

Hi @mgirdwood,

For both filter and join recipes, you will have to either create two separate recipes to see the differences and be able to utilize both in your flow, or you would have to modify the single recipe and rebuild the flow. It is required to run those recipes in order to see the data.

If it helps, simple left-joins on relatively small datasets can be performed using a Prepare recipe or visual analysis, which will visually show the output dataset before actually running the recipe. https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/prepare-advanced-joins.html

Also, for filtering you can also use a prepare recipe or just by going to the Explore tab of the dataset and utilizing the interactive filtering there. An example for how to filer dates through the Explore tab and prepare recipe can be found here: https://knowledge.dataiku.com/latest/courses/advanced-data-prep/prepare-recipe/interactive-date-filt... 

Hope this helps!

- Elias

0 Kudos
tgb417
Neuron
Neuron

@mgirdwood ,

I'm not clear that I completely understand your question.  If we are missing the point so far.  Please feel free to share some more details and even an example can be helpful

However, as @EliasH has pointed out a Left Join gives you all of the values from the first data set (the left set) and only those values from the second data set (the right set) that match.  The record from the first set that don't match the second set will be missing values in the columns that would have come from the second set.  (So the false for the match items would in effect be blank.)

Note there are some caveats to this approach.  If you have multiple values in the second set that match a record in the first set, records in that first dataset might appear in your results more than once.  And it is possible to miss values from the second set using a left join because they don't match even one record in the first set.

There are other types of joins out there that can be used:

https://www.w3schools.com/sql/sql_join.asp

Depending on the data you have a full outer join might also be helpful.  However, this requires additional technology like a SQL database that supports this feature.

https://community.dataiku.com/t5/Using-Dataiku/Full-outer-join/m-p/732

If a full outer join is what you would like my colleague @Ignacio_Toledo has helped out with another thread in the community that may be interesting to you.

https://community.dataiku.com/t5/General-Discussion/I-have-Full-outer-join-best-way-to-merge-data-wi...

Let us know how you are getting on with your project.

--Tom

--Tom
0 Kudos
mgirdwood
Level 2
Author

Hi Tom

I guess i over complicated the question, i was hoping to have two outputs from the one Filter recipe available directly.  the records that matched the criteria of the Filter in one output and the rest in a second output.  but i think the only way to do this is to have 2 separate filters with the opposite rules in each.  

Similar for the Join,  ie in a Left Join output in one list all the matching records based on the Left table and all the remaining records that did not match the join in a second list.  Not blank Records in the one list where the data didn't match.  

Mark

0 Kudos
tgb417
Neuron
Neuron

@mgirdwood ,

Thanks for getting back to us.  Have you looked at the split recipe?  Lots of options there for defining what data goes into which data set.  It will produce two or more resulting datasets based on a single dataset input.

Split recipie.png

Let us know if that is helpful.  If not tell us more.

 

--Tom
0 Kudos

Hi @mgirdwood !

Have you checked this plugin? Apparently it works only with datasets coming from SQL databases, but it does exactly what you need, by creating 3 output datasets.

Hope this helps!

Ignacio

0 Kudos
A banner prompting to get Dataiku DSS