## Sign up to take part

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forÂ

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

- Community
- Â»
- Discussions
- Â»
- Using Dataiku
- Â»

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Formula for if an entry starts with

Hello - I am looking to add a formula in a data recipe which will classify accounts based on their first 3 digits.

So for account numbers that start with '053' then I want the new column to cll these out as EQUITY (for example).

I can create formula based on accounts that == but I don't the syntax for start with - or contains??

Any ideas on how best to achieve this ?

Solutions shown first - Read whole discussion

8 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

Are you trying to process data using prepare recipe? if yes I could suggest using a "Replace" step. It allows you to define a regular expression patterns, so for the '053' -> 'EQUITY' the rule would look like this:

For the explanation of the demo expression that I used please refer to https://regex101.com/r/2mBStF/2

Regards

Andrey Avtomonov

R&D Engineer @ Dataiku

R&D Engineer @ Dataiku

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

In your description, it appears that you are actually searching text strings, not integers. (I'm guessing this because you are referencing a number that begins with a "0".) You might try an expression like this in a Formula step of a prepare recipe. An expression like the below will yield a "true" or "false" boolean value.

arrayContains(splitByLengths(id,3),"053")

This can then be wrapped in an if statement to get the result you need.

if(arrayContains(splitByLengths(id,3),"053"),"EQUITY ","?")

Here is a similar example that shows just getting true and false. That then could be used in an if() formula.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi - thanks for coming back to me. I am intersted in this formula and it is working well for the most part but I am having an issue when the account numbers contain only numbers.

So an account number is 8 digits and these are a combination of numbers and letters - sometimes only numbers and this is where the formula fails.

if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity,""))

This will work when the account # in ACCOUNT contains letters (e.g. 04056F4D)

But will faill when numbers only (e.g. 04045699)

The data type for ACCOUNT is string so I thought that this shouldn't be an issue?

Thanks again!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

This formula looks to be missing a quotation mark. See the red " I added below.

if(arraycontains(SplitByLengths(3,ACCOUNT),"040"),"Equity**"**,""))

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

What version of DSS are you using? I'm using the latest version V7.0.3.

This formula in your post looks to be missing a quotation mark. See the red **"** I added below.

Also, the order of the parameters in the SplitByLengths() function seems to be backward I would assume you mean **SplitByLengths(ACCOUNT,3)** . This reflects the first 3 characters of the ACCOUNT column.

`if(arraycontains(`**SplitByLengths(ACCOUNT,3)**,"040"),"Equity**"**,""))

if none of that solves your problems you might wrap ACCOUNT in a **strval()** function.

`if(arraycontains(`**SplitByLengths(strval("ACCOUNT"),3)**,"040"),"Equity**"**,""))

Hope that helps.

P.S. you can find short descriptions of all of the formula functions here.

https://doc.dataiku.com/dss/7.0/advanced/formula.html

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have this working peachy now - thanks for your help!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

For others who might come to this post I invite you to go ahead and mark whichever posts helped you solve the issue as "Accept as Solution". Or consider adding anything else you have learned so the next person who has this problem can find your result.

--Tom