Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Remove Default Scientific Notation

Datasets default to outputting numbers in scientific notation, even if the recipe calls for a different format. This is hindersome especially when exporting csv files, as the scientific notation will not be recognized as a number. This forces excel downloads, which are much larger and cumbersome to work with. Having a standard number format default, or having the ability to change default preferences would be very welcome.

Dataiker Alumni

Thanks for your idea @mar! Just incase you haven't already seen this, I just wanted to make this resource available to you in our Knowledge Base: How to remove scientific notation in a column 

I hope this helps!

Level 2

Thank you @CoreyS. That helps, but I am working with datasets that have 200+ columns and can't feasibly use this trick for each one individually. Is there a way to more broadly alter the formatting of a dataset?

It would be great if the default formatting of numbers and dates could be defined at a project level. Dataiku also converts ints to floats by default and does some date conversions by default. The main struggle for me is that Dataiku's default interpretation of numbers is to assume they're continuous values, but, at least in my company, at least half of the numbers we deal with every day are identifiers that should be treated categorically rather than numerically. I run into the scientific notation issue pretty frequently with part numbers, which are very long integers. Order numbers are also usually long integers. While forcing the types to strings usually resolves these issues, it'd be nice to have some global controls at the project level that set the default interpretations and formats, or even to be able to set complex rules about when to apply which formatting. 

Similar to this product idea, I also like to invite the development to make improvements in import defaults from text files, and how data columns are typed for processing throughout the pipeline.

How data columns are being auto typed

if we are looking to support “every day AI”. I think we need to keep the little things like the display and “typing” of columns from being a multi step, difficult process for beginning users. That don’t assumes some knowledge of Python Data structures and data object conventions. And instead allow a default experience closer to other data and number handling applications on the market.

Status changed to: Acknowledged

Thanks for the feedback. As of version 11.1, the default threshold for scientific notation has been increased from 10E7 &10E-7 to 10E15 & 10E-15... meaning, you will only see scientific notation for numbers that are quite large (and not frequently used). See note here:

We have logged this additional improvement, for more manual control, and I will let you know if we have any updates here.