Formula for if an entry starts with

Options
SeparateFiction
SeparateFiction Registered Posts: 7 ✭✭✭✭

Hello - I am looking to add a formula in a data recipe which will classify accounts based on their first 3 digits.

So for account numbers that start with '053' then I want the new column to cll these out as EQUITY (for example).

I can create formula based on accounts that == but I don't the syntax for start with - or contains??

Any ideas on how best to achieve this ?

Answers

  • SeparateFiction
    SeparateFiction Registered Posts: 7 ✭✭✭✭
    Options

    Even if I could create a new column which extracts the first 3 digits from the account column then I could use that, but I am also unsure how to do this. In Excel I could use a LEFT commmand

  • Andrey
    Andrey Dataiker Alumni Posts: 119 ✭✭✭✭✭✭✭
    Options

    Hi,

    Are you trying to process data using prepare recipe? if yes I could suggest using a "Replace" step. It allows you to define a regular expression patterns, so for the '053' -> 'EQUITY' the rule would look like this:

    Screenshot 2020-07-21 at 17.30.27.png

    For the explanation of the demo expression that I used please refer to https://regex101.com/r/2mBStF/2

    Regards

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @SeparateFiction

    In your description, it appears that you are actually searching text strings, not integers. (I'm guessing this because you are referencing a number that begins with a "0".) You might try an expression like this in a Formula step of a prepare recipe. An expression like the below will yield a "true" or "false" boolean value.

    arrayContains(splitByLengths(id,3),"053")

    This can then be wrapped in an if statement to get the result you need.

    if(arrayContains(splitByLengths(id,3),"053"),"EQUITY ","?")

    Here is a similar example that shows just getting true and false. That then could be used in an if() formula.

    Formula to Search for SubString.jpg

  • SeparateFiction
    SeparateFiction Registered Posts: 7 ✭✭✭✭
    Options

    Hi - thanks for coming back to me. I am intersted in this formula and it is working well for the most part but I am having an issue when the account numbers contain only numbers.

    So an account number is 8 digits and these are a combination of numbers and letters - sometimes only numbers and this is where the formula fails.

    if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity,""))

    This will work when the account # in ACCOUNT contains letters (e.g. 04056F4D)
    But will faill when numbers only (e.g. 04045699)

    The data type for ACCOUNT is string so I thought that this shouldn't be an issue?

    Thanks again!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    This formula looks to be missing a quotation mark. See the red " I added below.

    if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity",""))

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    What version of DSS are you using? I'm using the latest version V7.0.3.

    This formula in your post looks to be missing a quotation mark. See the red " I added below.

    Also, the order of the parameters in the SplitByLengths() function seems to be backward I would assume you mean SplitByLengths(ACCOUNT,3) . This reflects the first 3 characters of the ACCOUNT column.

    if(arraycontains(SplitByLengths(ACCOUNT,3),"040"),"Equity",""))

    if none of that solves your problems you might wrap ACCOUNT in a strval() function.

    if(arraycontains(SplitByLengths(strval("ACCOUNT"),3),"040"),"Equity",""))

    Hope that helps.

    P.S. you can find short descriptions of all of the formula functions here.

    https://doc.dataiku.com/dss/7.0/advanced/formula.html

  • SeparateFiction
    SeparateFiction Registered Posts: 7 ✭✭✭✭
    Options

    I have this working peachy now - thanks for your help!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    For others who might come to this post I invite you to go ahead and mark whichever posts helped you solve the issue as "Accept as Solution". Or consider adding anything else you have learned so the next person who has this problem can find your result.

  • Nicolas_Isaza
    Nicolas_Isaza Dataiku DSS Core Designer, Registered Posts: 1
    Options

    I'm trying to get a formula that returns True if a String of Column X contains something specific for Example "Warranty"

    if text.contains([Column X], "Warranty"). "True", "False"

    Thanks for the help

Setup Info
    Tags
      Help me…