Filter and join multiple outputs

Options
mgirdwood
mgirdwood Registered Posts: 7 ✭✭✭

Hi everyone,

A newbie question.

When using the Filter recipe is it possible to see both the True and False outputs ? to be used separately in the flow.

Also the same for the Join recipe, being able to see the data that doesnt fit the join ?

Thanks

Mark

Answers

  • EliasH
    EliasH Dataiker, Registered Posts: 34 Dataiker
    Options

    Hi @mgirdwood
    ,

    For both filter and join recipes, you will have to either create two separate recipes to see the differences and be able to utilize both in your flow, or you would have to modify the single recipe and rebuild the flow. It is required to run those recipes in order to see the data.

    If it helps, simple left-joins on relatively small datasets can be performed using a Prepare recipe or visual analysis, which will visually show the output dataset before actually running the recipe. https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/prepare-advanced-joins.html

    Also, for filtering you can also use a prepare recipe or just by going to the Explore tab of the dataset and utilizing the interactive filtering there. An example for how to filer dates through the Explore tab and prepare recipe can be found here: https://knowledge.dataiku.com/latest/courses/advanced-data-prep/prepare-recipe/interactive-date-filter.html

    Hope this helps!

    - Elias

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @mgirdwood
    ,

    I'm not clear that I completely understand your question. If we are missing the point so far. Please feel free to share some more details and even an example can be helpful

    However, as @EliasH
    has pointed out a Left Join gives you all of the values from the first data set (the left set) and only those values from the second data set (the right set) that match. The record from the first set that don't match the second set will be missing values in the columns that would have come from the second set. (So the false for the match items would in effect be blank.)

    Note there are some caveats to this approach. If you have multiple values in the second set that match a record in the first set, records in that first dataset might appear in your results more than once. And it is possible to miss values from the second set using a left join because they don't match even one record in the first set.

    There are other types of joins out there that can be used:

    https://www.w3schools.com/sql/sql_join.asp

    Depending on the data you have a full outer join might also be helpful. However, this requires additional technology like a SQL database that supports this feature.

    https://community.dataiku.com/t5/Using-Dataiku/Full-outer-join/m-p/732

    If a full outer join is what you would like my colleague @Ignacio_Toledo
    has helped out with another thread in the community that may be interesting to you.

    https://community.dataiku.com/t5/General-Discussion/I-have-Full-outer-join-best-way-to-merge-data-with-similar/m-p/11420

    Let us know how you are getting on with your project.

    --Tom

  • mgirdwood
    mgirdwood Registered Posts: 7 ✭✭✭
    Options

    Hi Tom

    I guess i over complicated the question, i was hoping to have two outputs from the one Filter recipe available directly. the records that matched the criteria of the Filter in one output and the rest in a second output. but i think the only way to do this is to have 2 separate filters with the opposite rules in each.

    Similar for the Join, ie in a Left Join output in one list all the matching records based on the Left table and all the remaining records that did not match the join in a second list. Not blank Records in the one list where the data didn't match.

    Mark

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @mgirdwood
    ,

    Thanks for getting back to us. Have you looked at the split recipe? Lots of options there for defining what data goes into which data set. It will produce two or more resulting datasets based on a single dataset input.

    Split recipie.png

    Let us know if that is helpful. If not tell us more.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Hi @mgirdwood
    !

    Have you checked this plugin? Apparently it works only with datasets coming from SQL databases, but it does exactly what you need, by creating 3 output datasets.

    Hope this helps!

    Ignacio

  • AshleyW
    AshleyW Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker
    Options

    Hi @mgirdwood
    ,

    We've recently released a new version of Dataiku, 11.3, and you can now add additional output datasets to a Join recipe that contains all the unmatched rows from either a left, right, or inner join. You'll be able to see the data that doesn't fit the join easily!

    Cheers,

    Ashley

Setup Info
    Tags
      Help me…