Community Conundrum 25:Feature Visualization is now live! Read More

Formula for if an entry starts with

Level 2
Formula for if an entry starts with

Hello - I am looking to add a formula in a data recipe which will classify accounts based on their first 3 digits.

So for account numbers that start with '053' then I want the new column to cll these out as EQUITY (for example).

I can create formula based on accounts that == but I don't the syntax for start with - or contains??

Any ideas on how best to achieve this ?

0 Kudos
8 Replies
Level 2
Author

Even if I could create a new column which extracts the first 3 digits from the account column then I could use that, but I am also unsure how to do this. In Excel I could use a LEFT commmand

Dataiker
Dataiker

Hi,

Are you trying to process data using prepare recipe? if yes I could suggest using a "Replace" step. It allows you to define a regular expression patterns, so for the '053' -> 'EQUITY' the rule would look like this:

 

Screenshot 2020-07-21 at 17.30.27.png

For the explanation of the demo expression that I used please refer to https://regex101.com/r/2mBStF/2

 

Regards

Andrey Avtomonov
R&D Engineer @ Dataiku
Level 6

@SeparateFiction 

In your description, it appears that you are actually searching text strings, not integers.  (I'm guessing this because you are referencing a number that begins with a "0".)  You might try an expression like this in a Formula step of a prepare recipe.  An expression like the below will yield a "true" or "false" boolean value.  

arrayContains(splitByLengths(id,3),"053")

This can then be wrapped in an if statement to get the result you need. 

if(arrayContains(splitByLengths(id,3),"053"),"EQUITY ","?")

Here is a similar example that shows just getting true and false.  That then could be used in an if() formula.

Formula to Search for SubString.jpg

 

 

 

--Tom
Level 2
Author

Hi - thanks for coming back to me. I am intersted in this formula and it is working well for the most part but I am having an issue when the account numbers contain only numbers.

So an account number is 8 digits and these are a combination of numbers and letters - sometimes only numbers and this is where the formula fails.

 

if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity,""))

This will work when the account # in ACCOUNT contains letters (e.g. 04056F4D)
But will faill when numbers only (e.g. 04045699)

 

The data type for ACCOUNT is string so I thought that this shouldn't be an issue?

 

Thanks again!

0 Kudos
Level 6

This formula looks to be missing a quotation mark. See the red " I added below.

if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity",""))

 

--Tom
0 Kudos
Level 6

What version of DSS are you using?  I'm using the latest version V7.0.3.

This formula in your post looks to be missing a quotation mark. See the red " I added below.

Also, the order of the parameters in the SplitByLengths() function seems to be backward I would assume you mean SplitByLengths(ACCOUNT,3) .  This reflects the first 3 characters of the ACCOUNT column.  

if(arraycontains(SplitByLengths(ACCOUNT,3),"040"),"Equity",""))

if none of that solves your problems you might wrap ACCOUNT in a strval() function.

if(arraycontains(SplitByLengths(strval("ACCOUNT"),3),"040"),"Equity",""))

Hope that helps.

P.S. you can find short descriptions of all of the formula functions here.

https://doc.dataiku.com/dss/7.0/advanced/formula.html

 

--Tom
0 Kudos
Level 2
Author

I have this working peachy now - thanks for your help!

Level 6

For others who might come to this post I invite you to go ahead and mark whichever posts helped you solve the issue as "Accept as Solution".  Or consider adding anything else you have learned so the next person who has this problem can find your result.

--Tom
0 Kudos