Dataiku formula language for SQL 'IN'?

Options
PANKAJ
PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner

Suppose I want to run a SQL query in which query is like " WHERE City IN ('A', 'B', 'C') AND ID IN (1,2,3) OR NAME='John' ".

I'm trying this in prepare recipe --> filter rows with formula

what's the analogous formula for "IN" in Dataiku?

Best Answer

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    edited July 17 Answer ✓
    Options

    Hello @PANKAJ
    ,

    With DSS 9.0.0, this formula should do the work:

    ( arrayContains(['A', 'B', 'FAIL'], City) && arrayContains([1, 2, 3], ID) ) || NAME == 'John'

    I think it will work also with previos DSS versions, but I thought it was important to include the version just in case.

    Hope this helps.

Answers

  • PANKAJ
    PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner
    Options

    Thanks, man.

    But I found this in the Dataiku documentation before your reply.

    Anyway Thanks again @Ignacio_Toledo

  • PANKAJ
    PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner
    Options

    @Ignacio_Toledo
    What if we want to that for "NOT IN A LIST"?

    I mean how can we use 'arrayContains' for NOT IN?

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    edited July 17
    Options

    If I understand correctly, I'd try:

    arrayContains(['A', 'B', 'FAIL'], City) == "False"

    So, if the column City doesn't contains A, B or Fail arrayContains will return False, and with the comparison operator == "False" we get the result expected for NOT IN (is True that City is not in A, B or Fail)

    Hope this works!

  • PANKAJ
    PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner
    Options

    @Ignacio_Toledo
    No, I want to write that along with && ,|| Operators.

    So is there something like arraynotContains or Not before the list of values?

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    I can only think about:

    not(arrayContains([array...], column))

    The negation of array contains. And this can be used with the logical operators... however I think my previous solution, while longer, should also allow the user of the logical operators. But haven't tried.

    Cheers!

Setup Info
    Tags
      Help me…