-
How to remove scientific notation in a column
Formatting numbers can often be a tedious data cleaning task. It can be made easier with the format() function of the DSS Formula language. This function takes a "printf format string" and applies it to any value. Format strings are immensely powerful, as they allow you to truncate strings, change precision, switch between…
-
How to copy a recipe in your Flow
Do you have recipes that you want to re-use elsewhere in a project? You can copy recipes from the Flow for use in the same project. From the Flow, click the recipe you want to copy, and a Copy action will appear in the Actions sidebar on the right. You will be asked to choose on which dataset the recipe should be applied…
-
How to segment your data using statistical quantiles
You can create statistical quantiles without code in Dataiku DSS in two ways: * The Split recipe allows you to break down each quantile into separate datasets, so it can be useful if you’re planning to separately handle a small amount of quantiles like quartiles or deciles. * The Window recipe allows you to create a new…
-
How to pad a number with leading zeros
A common requirement when you have a column of numbers is to format all numbers so that they have the same length, adding leading zeros if needed. This can be done in the DSS preparation recipe using a Formula. The formula function to use is format. For example, to ensure that all values of the column mycolumn are padded…
-
How to standardize text fields using fuzzy values clustering
When working with large amounts of disparate, user-entered text data, we often need to standardize or collapse entries into a resolved form. For example, how can we get a computer to recognize that strings like "Abraham Lincoln", "Abe Lincoln", and "Abrahm Lincoln" are actually the same category? We want to map these close…
-
How Dataiku DSS Handles and Displays Date & Time
DSS and dates In Dataiku DSS, “dates” mean “an absolute point in time”, meaning something that is expressible as a date and time and timezone. For example, 2001-01-20T14:00:00.000Z or 2001-01-20T16:00:00.000+0200, which refer to the same point in time (14:00Z is 2pm UTC, and 16:00+0200 is 4pm UTC+2, so 2pm UTC too). DSS…
-
How to copy-paste Prepare recipe steps
Often you may want to reuse the steps of a Prepare recipe in another location. Of course, it is possible to copy the entire Prepare recipe, as with any visual recipe. However, it is also possible to copy any number of individual Prepare recipe steps without recreating the entire recipe. First, select the needed steps from…