Filter hive dataset based on file parameter

lakrouf_1 Dataiku DSS Core Designer, Registered Posts: 2 ✭✭✭✭

Hello everyone,

Hope you're doing well

I've searched around but couldn't find a solution for this use case.

I would like to apply filter on dataset by uploading file who conatins data to be filtred.

To illustrate my use case,i have :

table x : in a hive dataset
and the where condition will be loaded file .csv

Do you have any idea how to do this ?

Thnaks in advance.


  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Hi, @lakrouf_1
    ! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

  • AndrewM
    AndrewM Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 20 Dataiker

    Hi, @lakrouf_1,

    Based on the description of the problem, you simply want to filter the data from A based on a column of values from B (the csv file). If so, this can be accomplished with a Join Recipe that utilizes an Inner Join on the column from your Hive dataset that needs to be filter along with the filter values column from the csv file. You flow would like this:

    Screen Shot 2021-01-06 at 2.39.55 PM.png

    and your Join Recipe would look similar to this, where col_0 contains the values from the csv file to filter by:

    Screen Shot 2021-01-06 at 2.40.55 PM.png

    Thank you,

    Andrew M

Setup Info
      Help me…