Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Remove Default Scientific Notation

Datasets default to outputting numbers in scientific notation, even if the recipe calls for a different format. This is hindersome especially when exporting csv files, as the scientific notation will not be recognized as a number. This forces excel downloads, which are much larger and cumbersome to work with. Having a standard number format default, or having the ability to change default preferences would be very welcome.

Community Manager
Community Manager

Thanks for your idea @mar! Just incase you haven't already seen this, I just wanted to make this resource available to you in our Knowledge Base: How to remove scientific notation in a column 

I hope this helps!

Level 2

Thank you @CoreyS. That helps, but I am working with datasets that have 200+ columns and can't feasibly use this trick for each one individually. Is there a way to more broadly alter the formatting of a dataset?


It would be great if the default formatting of numbers and dates could be defined at a project level. Dataiku also converts ints to floats by default and does some date conversions by default. The main struggle for me is that Dataiku's default interpretation of numbers is to assume they're continuous values, but, at least in my company, at least half of the numbers we deal with every day are identifiers that should be treated categorically rather than numerically. I run into the scientific notation issue pretty frequently with part numbers, which are very long integers. Order numbers are also usually long integers. While forcing the types to strings usually resolves these issues, it'd be nice to have some global controls at the project level that set the default interpretations and formats, or even to be able to set complex rules about when to apply which formatting. 


Similar to this product idea, I have also like to invite the development to make improvements in import defaults from text files, and how data columns are typed for processing throughout the pipeline.

How data columns are being auto typed


if we are looking to support “every day AI”. I think we need to keep the little things like the display and “typing” of columns from being a multi step, difficult process for beginning users.  That don’t assumes some knowledge of Python Data structures and data object conventions. And instead allow a default experience closer to other data and number handling applications on the market.