Remove Default Scientific Notation

mar
mar Registered Posts: 2 ✭✭✭✭

Datasets default to outputting numbers in scientific notation, even if the recipe calls for a different format. This is hindersome especially when exporting csv files, as the scientific notation will not be recognized as a number. This forces excel downloads, which are much larger and cumbersome to work with. Having a standard number format default, or having the ability to change default preferences would be very welcome.

7
7 votes

In the Backlog · Last Updated

Comments

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Thanks for your idea @mar
    ! Just incase you haven't already seen this, I just wanted to make this resource available to you in our Knowledge Base: How to remove scientific notation in a column

    I hope this helps!

  • mar
    mar Registered Posts: 2 ✭✭✭✭

    Thank you @CoreyS
    . That helps, but I am working with datasets that have 200+ columns and can't feasibly use this trick for each one individually. Is there a way to more broadly alter the formatting of a dataset?

  • natejgardner
    natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron

    It would be great if the default formatting of numbers and dates could be defined at a project level. Dataiku also converts ints to floats by default and does some date conversions by default. The main struggle for me is that Dataiku's default interpretation of numbers is to assume they're continuous values, but, at least in my company, at least half of the numbers we deal with every day are identifiers that should be treated categorically rather than numerically. I run into the scientific notation issue pretty frequently with part numbers, which are very long integers. Order numbers are also usually long integers. While forcing the types to strings usually resolves these issues, it'd be nice to have some global controls at the project level that set the default interpretations and formats, or even to be able to set complex rules about when to apply which formatting.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    Similar to this product idea, I also like to invite the development to make improvements in import defaults from text files, and how data columns are typed for processing throughout the pipeline.

    How data columns are being auto typed
    https://community.dataiku.com/t5/Product-Ideas/The-ability-to-turn-off-Cell-level-quot-Duck-Typing-quot-within/idi-p/16792

    https://community.dataiku.com/t5/Product-Ideas/Better-Parsing-of-Numbers-from-Text-Files/idi-p/12753

    if we are looking to support “every day AI”. I think we need to keep the little things like the display and “typing” of columns from being a multi step, difficult process for beginning users. That don’t assumes some knowledge of Python Data structures and data object conventions. And instead allow a default experience closer to other data and number handling applications on the market.

  • Katie
    Katie Dataiker, Registered, Product Ideas Manager Posts: 106 Dataiker

    Thanks for the feedback. As of version 11.1, the default threshold for scientific notation has been increased from 10E7 &10E-7 to 10E15 & 10E-15... meaning, you will only see scientific notation for numbers that are quite large (and not frequently used). See note here: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-remove-scientific-notation-in-a-column.html

    We have logged this additional improvement, for more manual control, and I will let you know if we have any updates here.

Setup Info
    Tags
      Help me…