Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th

# Safe sums across columns in DSS Formulas

People will usually tell you that 1+1=2. In general, you can believe them; however, in some cases in DSS, the following may also be true from time to time : 1+1=11.

🤯 🤔 “What in the name of the bird?!

### TL;DR

Why might these two numbers be concatenating when they’re not strings? That's because there are 3 things you need to know about  operations in the Formulas editor:

1. Be careful when using `+` or `-`: only do it if your data does not contain any null values.
2. Use the `sum()` function instead:  `sum([col_A, col_B, col _C, col_D])` ← use those brackets 👌
3. Or use a nested IF: `if(isnull(col_A),0,col_A)+if(isnull(col_B),0,col_B)` ← tedious, but also effective

### When do 1+1=11 and 1+1=2 both happen?

I was recently playing with a dataset in which I needed to create a sum across columns, and I found something that astonished me with a formula that went A+B+C+D :

 A int B int C int D int =A+B+C+D int 1 2 3 4 10 null 2 3 4 234 1 null 3 4 134 1 2 null 4 34 1 2 3 null 6

"But all my data is stored as integers! What is the logic behind this add/concatenate behavior?"

### Why?

🧐 Let’s see what’s happening:

null+2+3+4=234 ← (null+2), then concatenate 3, concatenate 4

1+null+3+4=134 ← (1+null), then concatenate 3, concatenate 4

1+2+null+4=34 ← add 1+2, then (+ null), then concatenate 4

1+2+3+null=6 ← add 1+2 , add 3, (+null) then...

See the pattern of how addition turns into concatenation? It always happens after a null value. Why? The short answer is, "It’s complicated".

### Switches from Add → Concatenate

Let's K.I.S.S. the long answer:

Auto-detection for operations occurs by value and not by column.

The detection of the column datatype occurs after all the operations have been completed.

Concatenate always follows null + anything. It doesn’t matter that all of the data in your columns are numbers, or that they’re stored as a number data type, or that logically 1+nothing+3+4 should be 8, not 134.

What happened in the case of the lovely A+B+C+D above is that null values are turned into “”. Whenever the processor sees anything with a + “” it defaults to concatenation and turns the result into a string.

 What happens IRL Voice of the processor A+B are two numbers... Two numbers? We add those... yet as soon as C comes into the picture, let’s assume C is a null value... I see that C is null. ...it flips a switch... null is “”   “” is string. ..and the processor will forever concatenate whatever comes after. + and string is always concatenate.  Result is string. Repeat to ♾️

### Another Example

In the following example, everything works according to how we’d expect it to because G is the result of several, nested operations:

 A int B int C int D int E=A+B Int F=E+C Int G=F+D Int 1 2 3 4 3 6 10 2 3 4 2 5 9 1 3 4 1 4 8 1 2 4 3 3 7 1 2 3 3 6 6

Let’s walk through an example of how the numbers in column E get crunched:

 What happens IRL: A+B Voice of the processor 1+2→ 3 Number and number? Add. Null +2 → “”+2→ 2 String and number? Concatenate. 1+ Null → 1+“” → 1 One number and string? Concatenate. Col type = int Column type is number because column detection happens at the end!

### Tips and tricks for safe sums in DSS

• Your first option is to get rid of null values. Maybe you replace them with 0 or maybe you do something else; it'll depend on your data.
• There are many cases, however, where you want to keep null values to do other types of analyses (count null, anyone?). If so, do either of these:
• Use the `sum()` function instead: `sum([A,B,C,D])` ← see those brackets? Use them!
• Or use a nested IF: `if(isnull(A),0,A)+if(isnull(B),0,B)` ← tedious, but also effective
Dataiker

Hi @AshleyW,

One tip: you can replace `if(isnull(A),0,A)` by `coalesce(A,0)`.

Labels (1)
• ### Formula

Version history
Revision #:
6 of 6
Last update:
‎03-17-2020 02:07 PM
Updated by:
Contributors