Extract large value

Options
Qris
Qris Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer Posts: 1 ✭✭✭

Hi all,

I have a column with both numbers and text and I'd like to extract the values to a new column. Everything works fine, except for large numbers. The original file is an Excel file with around 400k rows and only a handful of values are large. For this task I use the prepare recipe -> “Extract a number”. The number in Excel is listed as (for example) 1.7099041E12. The prepare recipe makes it ~1,7 instead of ~1.700.000.000.000. What recipe do I need to use or how can I fix this?

Regards,

Qris

Answers

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Hi @Qris
    ,

    My approach would be to look for a suitable separator between textvalue and numeric value, and then split that column on that separator. A space for example. To find that possible separator i would temporarily filter out everything below a certain amount of characters. Tip : resulting columns from splitting can be limited with the "truncate" option. Another approach might be to filter on values with the structure number(s)-E-number(s) with some regex, and extract those to a new column.

    Edit : you mentioned a low volume of problematic values. it might be helpful to split the dataset in a part which works out fine and a part with those problematic values, for example using that number(s)-E-number(s) format. Analyzing a smaller record-set is faster and avoids the pitfall of not adjusting the sample size when analyzing.

    Jurre

Setup Info
    Tags
      Help me…