why concat function in dataiku erases zeros?

PANKAJ
PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner

When I'm creating a new column using formula inside the prepare recipe to Concat two columns.

Supppose, the col_1 has value '00' and col_2 has value '000000' then contact(col_1,col_2) should be '00000000',but I'm getting '00' as output in the new column. & if col_1 has value 'S' and col_2 has value '000125' then it should be 'S000125', but it is coming 'S125'.

I've changed the meaning of col_1 and col_2 as 'Text' still getting the same result.

Can anyone help?

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓

    Hi,

    in formulas, referring to columns directly using the column name is translated as "read the column value and convert to numeric value if applicable", so this will naturally lose the leading zeroes. To keep them, you need to explicitely tell the formula to retrieve the values as strings by using `concat(strval("col_1"), strval("col_2"))` (note that the column name is passed as a literal to the `strval()` function)

Answers

  • PANKAJ
    PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner

    Thanks....It's working.

    Can you add inputs on how to add leading zeros if I need them while doing concatenation?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @PANKAJ
    ,

    Here is a simple approach that helped me clean up some issues that I discovered in the Conundrum 14 dataset missing leading 0s in the ZipCode Column.

    CleanUp US Postal Codes.jpg

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    for a column containing integers, you can use a formula and ask for a string format. For example

    format("%05d", col_1)

    will take col_1 = 33 and produce 00033

    If the column contains mixed types, like some textual codes and some integral codes, the approach of concatenating the appropriate number of zeroes is the way to go. A slightly simpler way of doing it is to make use of the fact that substring() can take negative arguments to indicate "index from the end". For example to pad with zeroes up to 5 characters:

    concat("00000", col_1).substring(-5)

Setup Info
    Tags
      Help me…