why concat function in dataiku erases zeros?

Solved!
PANKAJ
Level 3
why concat function in dataiku erases zeros?

When I'm creating a new column using formula inside the prepare recipe to Concat two columns. 

Supppose, the col_1 has value '00' and col_2 has value '000000' then contact(col_1,col_2) should be '00000000',but I'm getting '00' as output in the new column. & if col_1 has value 'S' and col_2 has value '000125' then it should be 'S000125', but it is coming 'S125'.

I've changed the meaning of col_1 and col_2 as 'Text' still getting the same result.

Can anyone help? 

0 Kudos
1 Solution
fchataigner2
Dataiker

Hi,

in formulas, referring to columns directly using the column name is translated as "read the column value and convert to numeric value if applicable", so this will naturally lose the leading zeroes. To keep them, you need to explicitely tell the formula to retrieve the values as strings by using `concat(strval("col_1"), strval("col_2"))` (note that the column name is passed as a literal to the `strval()` function)

View solution in original post

4 Replies
fchataigner2
Dataiker

Hi,

in formulas, referring to columns directly using the column name is translated as "read the column value and convert to numeric value if applicable", so this will naturally lose the leading zeroes. To keep them, you need to explicitely tell the formula to retrieve the values as strings by using `concat(strval("col_1"), strval("col_2"))` (note that the column name is passed as a literal to the `strval()` function)

PANKAJ
Level 3
Author

Thanks....It's working.

Can you add inputs on how to add leading zeros if I need them while doing concatenation?

0 Kudos
tgb417

@PANKAJ ,

Here is a simple approach that helped me clean up some issues that I discovered in the Conundrum 14 dataset missing leading 0s in the ZipCode Column.

CleanUp US Postal Codes.jpg

 

--Tom
0 Kudos
fchataigner2
Dataiker

for a column containing integers, you can use a formula and ask for a string format. For example 

format("%05d", col_1)

will take col_1 = 33   and produce 00033

If the column contains mixed types, like some textual codes and some integral codes, the approach of concatenating the appropriate number of zeroes is the way to go. A slightly simpler way of doing it is to make use of the fact that substring() can take negative arguments to indicate "index from the end". For example to pad with zeroes up to 5 characters:

concat("00000", col_1).substring(-5)

0 Kudos