How to fill empty cells of a column with the value of the corresponding row from another column

How to fill empty cells of a column with the value of the corresponding row from another column

Handling missing data is one data preparation challenge that analysts routinely face. Should you discard observations with missing values or perhaps impute missing values with a summary value like the median? 

To handle missing data, the Prepare recipe has dozens of built-in processors ready to solve many of the most common challenges without any coding. In addition, Dataiku DSS has its own Formula language to craft more custom solutions.

For example, in some cases, you may want to fill the empty cells of a column with values of the corresponding rows from another column. 

In a Prepare recipe, use the Formula processor with the `coalesce()` function as shown below:

kb-coalesce-1.pngHere we fill the empty values of `col1` with the corresponding values of `col2` in a new column.

Instead of another column, you can also specify the missing values by directly providing it.

kb-coalesce-2.pngHere we fill the empty values of `col1` with the value of `0`.

The Formula language gives you the flexibility to achieve more customized tasks. For example, you can combine functions in the same expression.

kb-coalesce-3.pngHere we fill the empty values of `col1` with the corresponding floored values of `col2` in a new column.

Where can I find more information?

  • See this article and video to learn more about using Formulas in Dataiku DSS.

What’s next?

  • You can also learn more about visual data wrangling more broadly with DSS with this series of hands-on tutorials.
Labels (3)
Version history
Revision #:
6 of 6
Last update:
2 weeks ago
Updated by:
 
Contributors