Remove Default Scientific Notation

Datasets default to outputting numbers in scientific notation, even if the recipe calls for a different format. This is hindersome especially when exporting csv files, as the scientific notation will not be recognized as a number. This forces excel downloads, which are much larger and cumbersome to work with. Having a standard number format default, or having the ability to change default preferences would be very welcome.

5 Comments
CoreyS
Dataiker Alumni

Thanks for your idea @mar! Just incase you haven't already seen this, I just wanted to make this resource available to you in our Knowledge Base: How to remove scientific notation in a column 

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!

Thanks for your idea @mar! Just incase you haven't already seen this, I just wanted to make this resource available to you in our Knowledge Base: How to remove scientific notation in a column 

I hope this helps!

mar
Level 2

Thank you @CoreyS. That helps, but I am working with datasets that have 200+ columns and can't feasibly use this trick for each one individually. Is there a way to more broadly alter the formatting of a dataset?

Thank you @CoreyS. That helps, but I am working with datasets that have 200+ columns and can't feasibly use this trick for each one individually. Is there a way to more broadly alter the formatting of a dataset?

It would be great if the default formatting of numbers and dates could be defined at a project level. Dataiku also converts ints to floats by default and does some date conversions by default. The main struggle for me is that Dataiku's default interpretation of numbers is to assume they're continuous values, but, at least in my company, at least half of the numbers we deal with every day are identifiers that should be treated categorically rather than numerically. I run into the scientific notation issue pretty frequently with part numbers, which are very long integers. Order numbers are also usually long integers. While forcing the types to strings usually resolves these issues, it'd be nice to have some global controls at the project level that set the default interpretations and formats, or even to be able to set complex rules about when to apply which formatting. 

It would be great if the default formatting of numbers and dates could be defined at a project level. Dataiku also converts ints to floats by default and does some date conversions by default. The main struggle for me is that Dataiku's default interpretation of numbers is to assume they're continuous values, but, at least in my company, at least half of the numbers we deal with every day are identifiers that should be treated categorically rather than numerically. I run into the scientific notation issue pretty frequently with part numbers, which are very long integers. Order numbers are also usually long integers. While forcing the types to strings usually resolves these issues, it'd be nice to have some global controls at the project level that set the default interpretations and formats, or even to be able to set complex rules about when to apply which formatting. 

Similar to this product idea, I also like to invite the development to make improvements in import defaults from text files, and how data columns are typed for processing throughout the pipeline.

How data columns are being auto typed
https://community.dataiku.com/t5/Product-Ideas/The-ability-to-turn-off-Cell-level-quot-Duck-Typing-q...

https://community.dataiku.com/t5/Product-Ideas/Better-Parsing-of-Numbers-from-Text-Files/idi-p/12753

if we are looking to support โ€œevery day AIโ€. I think we need to keep the little things like the display and โ€œtypingโ€ of columns from being a multi step, difficult process for beginning users. That donโ€™t assumes some knowledge of Python Data structures and data object conventions. And instead allow a default experience closer to other data and number handling applications on the market.

--Tom

Similar to this product idea, I also like to invite the development to make improvements in import defaults from text files, and how data columns are typed for processing throughout the pipeline.

How data columns are being auto typed
https://community.dataiku.com/t5/Product-Ideas/The-ability-to-turn-off-Cell-level-quot-Duck-Typing-q...

https://community.dataiku.com/t5/Product-Ideas/Better-Parsing-of-Numbers-from-Text-Files/idi-p/12753

if we are looking to support โ€œevery day AIโ€. I think we need to keep the little things like the display and โ€œtypingโ€ of columns from being a multi step, difficult process for beginning users. That donโ€™t assumes some knowledge of Python Data structures and data object conventions. And instead allow a default experience closer to other data and number handling applications on the market.

ktgross15
Dataiker

Thanks for the feedback. As of version 11.1, the default threshold for scientific notation has been increased from 10E7 &10E-7 to 10E15 & 10E-15... meaning, you will only see scientific notation for numbers that are quite large (and not frequently used). See note here: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-remove-scientific-notation-i...

We have logged this additional improvement, for more manual control, and I will let you know if we have any updates here.

Status changed to: In the Backlog

Thanks for the feedback. As of version 11.1, the default threshold for scientific notation has been increased from 10E7 &10E-7 to 10E15 & 10E-15... meaning, you will only see scientific notation for numbers that are quite large (and not frequently used). See note here: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-remove-scientific-notation-i...

We have logged this additional improvement, for more manual control, and I will let you know if we have any updates here.