Formula for if an entry starts with
Hello - I am looking to add a formula in a data recipe which will classify accounts based on their first 3 digits.
So for account numbers that start with '053' then I want the new column to cll these out as EQUITY (for example).
I can create formula based on accounts that == but I don't the syntax for start with - or contains??
Any ideas on how best to achieve this ?
Answers
-
Even if I could create a new column which extracts the first 3 digits from the account column then I could use that, but I am also unsure how to do this. In Excel I could use a LEFT commmand
-
Hi,
Are you trying to process data using prepare recipe? if yes I could suggest using a "Replace" step. It allows you to define a regular expression patterns, so for the '053' -> 'EQUITY' the rule would look like this:
For the explanation of the demo expression that I used please refer to https://regex101.com/r/2mBStF/2
Regards
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
In your description, it appears that you are actually searching text strings, not integers. (I'm guessing this because you are referencing a number that begins with a "0".) You might try an expression like this in a Formula step of a prepare recipe. An expression like the below will yield a "true" or "false" boolean value.
arrayContains(splitByLengths(id,3),"053")
This can then be wrapped in an if statement to get the result you need.
if(arrayContains(splitByLengths(id,3),"053"),"EQUITY ","?")
Here is a similar example that shows just getting true and false. That then could be used in an if() formula.
-
Hi - thanks for coming back to me. I am intersted in this formula and it is working well for the most part but I am having an issue when the account numbers contain only numbers.
So an account number is 8 digits and these are a combination of numbers and letters - sometimes only numbers and this is where the formula fails.
if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity,""))
This will work when the account # in ACCOUNT contains letters (e.g. 04056F4D)
But will faill when numbers only (e.g. 04045699)The data type for ACCOUNT is string so I thought that this shouldn't be an issue?
Thanks again!
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
This formula looks to be missing a quotation mark. See the red " I added below.
if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity",""))
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
What version of DSS are you using? I'm using the latest version V7.0.3.
This formula in your post looks to be missing a quotation mark. See the red " I added below.
Also, the order of the parameters in the SplitByLengths() function seems to be backward I would assume you mean SplitByLengths(ACCOUNT,3) . This reflects the first 3 characters of the ACCOUNT column.
if(arraycontains(SplitByLengths(ACCOUNT,3),"040"),"Equity",""))
if none of that solves your problems you might wrap ACCOUNT in a strval() function.
if(arraycontains(SplitByLengths(strval("ACCOUNT"),3),"040"),"Equity",""))
Hope that helps.
P.S. you can find short descriptions of all of the formula functions here.
https://doc.dataiku.com/dss/7.0/advanced/formula.html
-
I have this working peachy now - thanks for your help!
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
For others who might come to this post I invite you to go ahead and mark whichever posts helped you solve the issue as "Accept as Solution". Or consider adding anything else you have learned so the next person who has this problem can find your result.
-
I'm trying to get a formula that returns True if a String of Column X contains something specific for Example "Warranty"
if text.contains([Column X], "Warranty"). "True", "False"
Thanks for the help