Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Extract large value

Qris
Level 1
Extract large value

Hi all, 

I have a column with both numbers and text and I'd like to extract the values to a new column. Everything works fine, except for large numbers. The original file is an Excel file with around 400k rows and only a handful of values are large. For this task I use the prepare recipe -> “Extract a number”. The number in Excel is listed as (for example) 1.7099041E12. The prepare recipe makes it ~1,7 instead of ~1.700.000.000.000. What recipe do I need to use or how can I fix this?

Regards,

Qris

 

0 Kudos
1 Reply
Jurre
Neuron
Neuron

Hi @Qris ,

My approach would be to look for a suitable separator between textvalue and numeric value, and then split that column on that separator. A space for example. To find that possible separator i would temporarily filter out everything below a certain amount of characters. Tip : resulting columns from splitting can be limited with the "truncate" option. Another approach might be to filter on values with the structure number(s)-E-number(s) with some regex, and extract those to a new column. 

Edit : you mentioned a low volume of problematic values. it might be helpful to split the dataset in a part which works out fine and a part with those problematic values, for example using that number(s)-E-number(s) format. Analyzing a smaller record-set is faster and avoids the pitfall of not adjusting the sample size when analyzing. 

Jurre

0 Kudos