Join us on July 16th as we explore real-world Reinforcement Learning Learn more

Append leading zeroes to a nunber

Level 1
Append leading zeroes to a nunber

Hi,

I have a columns which is an integer and I want to make it consistent length of 13 and append leading zeroes to make it 13.

Secondly I also want to make sure that the first three places are 001 followed by some leading zeroes and then the number.

Example : 

Test Column
----------------

9894567 - Output should be 0010009894567 

123567891 - Output should be 0010123567891 

 

any help?

0 Kudos
1 Reply
Dataiker
Dataiker

Hi,

So there are definitely several ways you can accomplish this through DSS! 

If you are a "coder", then one option would be to use Python or R via a code recipe to read the dataset into a data frame and then use code to handle the transformation that you need:
https://doc.dataiku.com/dss/latest/code_recipes/index.html

However, for the sake of argument, let's say that you are not a "coder" but a "clicker" instead. You can still accomplish this by utilizing various processors through a Prepare recipe, which I will walk through below. 

  1. Per the example you provided, let's say we had the following sample dataset (in the Prepare recipe I had first changed the storage type and meaning of this column to String and Text respectively):
    Screen Shot 2020-02-05 at 6.24.14 PM.png

  2. We can then create a secondary temp column called Temp and filled it with "0010000000000" (which is a generic "001" followed by zeros with a total string length of 13 as you've described) by using the Fill a column with a constant value processor. 
    Screen Shot 2020-02-05 at 6.25.11 PM.png

  3. We can then add a formula to split this Temp column based on the actual length of the Test_Column number/string (so that we can concatenate it back together later) by using the splitByLengths() processor and create a new Temp2 column with the result. Now there's a couple additional points worth noting:
    - This formula includes an IF condition to check whether the original number exceeds a length of 13 characters. In case it does, it just takes the original number and trims it to use just the first 13 digits. If this corner case is not a concern, then you can simplify the formula and not make it so complex. 
    - The reason strval() is needed is to prevent DSS from automatically typing a numerical variable and to treat it as a string (so you don't lose the leading zeroes in front). More information about why this is needed can be found in our documentation here
    Screen Shot 2020-02-05 at 6.39.31 PM.png

  4. As the splitByLengths() formula function produces an array, we'll need to extract the underlying number/string from the array. This can be done by using the Extract from array processor. 
    Screen Shot 2020-02-05 at 6.39.57 PM.png

  5. We then need to add one final formula to concatenate the results and produce the final string, which is an 13-digit integer that starts with "001", then contains zeroes with the original number appended at the end per our initial goal. Again, we include an IF condition here just in case the original number is longer than 13 digits (as a corner case), in which case we use the first 13 digits of this number instead. 
    Screen Shot 2020-02-05 at 6.45.24 PM.png

  6. Finally, we remove the two temp columns and run the recipe. Voila!
    Screen Shot 2020-02-05 at 6.46.29 PM.png

I hope this example proves helpful and please keep in mind that you might need to adjust or add steps/processors/formulas as necessary depending on your particular use case. As mentioned earlier, this just illustrates one way this can be done using non-coding options that can be found in DSS. If you are a data scientist or a coder, then there is always the option to write your own code instead via a code recipe! 🙂

Cheers,
Andrew