Safe sums across columns in DSS Formulas
People will usually tell you that 1+1=2. In general, you can believe them; however, in some cases in DSS, the following may also be true from time to time : 1+1=11.
🤯
TL;DR
Why might these two numbers be concatenating when they’re not strings? That's because there are 3 things you need to know about operations in the Formulas editor:
- Be careful when using
+
or-
: only do it if your data does not contain any null values. - Use the
sum()
function instead:sum([col_A, col_B, col _C, col_D])
← use those brackets - Or use a nested IF:
if(isnull(col_A),0,col_A)+if(isnull(col_B),0,col_B)
← tedious, but also effective
When do 1+1=11 and 1+1=2 both happen?
I was recently playing with a dataset in which I needed to create a sum across columns, and I found something that astonished me with a formula that went A+B+C+D :
A int | B int | C int | D int | =A+B+C+D int |
1 | 2 | 3 | 4 | 10 |
null | 2 | 3 | 4 | 234 |
1 | null | 3 | 4 | 134 |
1 | 2 | null | 4 | 34 |
1 | 2 | 3 | null | 6 |
"But all my data is stored as integers! What is the logic behind this add/concatenate behavior?"
Why?
🧐 Let’s see what’s happening:
1+2+3+4=10 ← add 1+2 , add 3, add 4
null+2+3+4=234 ← (null+2), then concatenate 3, concatenate 4
1+null+3+4=134 ← (1+null), then concatenate 3, concatenate 4
1+2+null+4=34 ← add 1+2, then (+ null), then concatenate 4
1+2+3+null=6 ← add 1+2 , add 3, (+null) then...
See the pattern of how addition turns into concatenation? It always happens after a null value. Why? The short answer is, "It’s complicated".
Switches from Add → Concatenate
Let's K.I.S.S. the long answer:
Auto-detection for operations occurs by value and not by column.
The detection of the column datatype occurs after all the operations have been completed.
Concatenate always follows null + anything. It doesn’t matter that all of the data in your columns are numbers, or that they’re stored as a number data type, or that logically 1+nothing+3+4 should be 8, not 134.
What happened in the case of the lovely A+B+C+D above is that null values are turned into “”. Whenever the processor sees anything with a + “” it defaults to concatenation and turns the result into a string.
What happens IRL | Voice of the processor |
A+B are two numbers... | Two numbers? We add those... |
yet as soon as C comes into the picture, let’s assume C is a null value... | I see that C is null. |
...it flips a switch... | null is “” “” is string. |
..and the processor will forever concatenate whatever comes after. | + and string is always concatenate. Result is string. Repeat to |
Another Example
In the following example, everything works according to how we’d expect it to because G is the result of several, nested operations:
A int | B int | C int | D int | E=A+B Int | F=E+C Int | G=F+D Int |
1 | 2 | 3 | 4 | 3 | 6 | 10 |
2 | 3 | 4 | 2 | 5 | 9 | |
1 | 3 | 4 | 1 | 4 | 8 | |
1 | 2 | 4 | 3 | 3 | 7 | |
1 | 2 | 3 | 3 | 6 | 6 |
Let’s walk through an example of how the numbers in column E get crunched:
What happens IRL: A+B | Voice of the processor |
1+2→ 3 | Number and number? Add. |
Null +2 → “”+2→ 2 | String and number? Concatenate. |
1+ Null → 1+“” → 1 | One number and string? Concatenate. |
Col type = int | Column type is number because column detection happens at the end! |
Tips and tricks for safe sums in DSS
- Your first option is to get rid of null values. Maybe you replace them with 0 or maybe you do something else; it'll depend on your data.
- There are many cases, however, where you want to keep null values to do other types of analyses (count null, anyone?). If so, do either of these:
- Use the
sum()
function instead:sum([A,B,C,D])
← see those brackets? Use them! - Or use a nested IF:
if(isnull(A),0,A)+if(isnull(B),0,B)
← tedious, but also effective