-
How to standardize text fields using fuzzy values clustering
When working with large amounts of disparate, user-entered text data, we often need to standardize or collapse entries into a resolved form. For example, how can we get a computer to recognize that strings like "Abraham Lincoln", "Abe Lincoln", and "Abrahm Lincoln" are actually the same category? We want to map these close…
-
How to copy a recipe in your Flow
Do you have recipes that you want to re-use elsewhere in a project? You can copy recipes from the Flow for use in the same project. From the Flow, click the recipe you want to copy, and a Copy action will appear in the Actions sidebar on the right. You will be asked to choose on which dataset the recipe should be applied…
-
How to segment your data using statistical quantiles
You can create statistical quantiles without code in Dataiku DSS in two ways: * The Split recipe allows you to break down each quantile into separate datasets, so it can be useful if you’re planning to separately handle a small amount of quantiles like quartiles or deciles. * The Window recipe allows you to create a new…
-
In a formula, how to check if a variable belongs to a set of values
A common need when doing formulas is to check whether a variable (generally, a column) belongs to a set of values. For example, you may want to check if "mycolumn" has value 1, 2 or 3. To do that in a formula, use the arrayContains function: arrayContains([1,2,3], mycolumn) In other words, instead of checking if "mycolumn…
-
How to fill empty cells of a column with the value of the corresponding row from another column
Handling missing data is one data preparation challenge that analysts routinely face. Should you discard observations with missing values or perhaps impute missing values with a summary value like the median? To handle missing data, the Prepare recipe has dozens of built-in processors ready to solve many of the most common…
-
How to copy-paste Prepare recipe steps
Often you may want to reuse the steps of a Prepare recipe in another location. Of course, it is possible to copy the entire Prepare recipe, as with any visual recipe. However, it is also possible to copy any number of individual Prepare recipe steps without recreating the entire recipe. First, select the needed steps from…
-
How to reorder or hide the columns of a dataset
In many cases, you might want to reorder the columns of a dataset. In other situations, you might just want to temporarily hide columns from view. Both actions can easily be achieved with Dataiku DSS. Reorder Columns In a Prepare recipe (or a Visual Analysis in the Lab), use the Move columns processor to alter the order of…