## Sign up to take part

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Solved!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

why concat function in dataiku erases zeros?

When I'm creating a new column using formula inside the prepare recipe to Concat two columns.

Supppose, the col_1 has value '00' and col_2 has value '000000' then contact(col_1,col_2) should be '00000000',but I'm getting '00' as output in the new column. & if col_1 has value 'S' and col_2 has value '000125' then it should be 'S000125', but it is coming 'S125'.

I've changed the meaning of col_1 and col_2 as 'Text' still getting the same result.

Can anyone help?

1 Solution

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

in formulas, referring to columns directly using the column name is translated as "read the column value and convert to numeric value if applicable", so this will naturally lose the leading zeroes. To keep them, you need to explicitely tell the formula to retrieve the values as strings by using `concat(strval("col_1"), strval("col_2"))` (note that the column name is passed as a literal to the `strval()` function)

Solutions shown first - Read whole discussion

4 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

in formulas, referring to columns directly using the column name is translated as "read the column value and convert to numeric value if applicable", so this will naturally lose the leading zeroes. To keep them, you need to explicitely tell the formula to retrieve the values as strings by using `concat(strval("col_1"), strval("col_2"))` (note that the column name is passed as a literal to the `strval()` function)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks....It's working.

Can you add inputs on how to add leading zeros if I need them while doing concatenation?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

@PANKAJ ,

Here is a simple approach that helped me clean up some issues that I discovered in the Conundrum 14 dataset missing leading 0s in the ZipCode Column.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

for a column containing integers, you can use a formula and ask for a string format. For example

format("%05d", col_1)

will take col_1 = 33 and produce 00033

If the column contains mixed types, like some textual codes and some integral codes, the approach of concatenating the appropriate number of zeroes is the way to go. A slightly simpler way of doing it is to make use of the fact that substring() can take negative arguments to indicate "index from the end". For example to pad with zeroes up to 5 characters:

concat("00000", col_1).substring(-5)