Wrong bubble sizes in grouped bubble charts

Solved!
johan
Level 1
Wrong bubble sizes in grouped bubble charts

Hi,



I'm trying to understand what the relationship is between the size of bubbles in a bubble chart (EDIT: more precisely a grouped bubble chart) and the underlying values. I would expect the area of the bubbles to be in a linear relationship with the values they represent, but I have a chart in which this is clearly not the case.



Below is a bubble chart on a small data set of movie ratings. The value for the second bubble (1940-1945) is 3. The value for the second-to-last bubble is 309, i.e., 100 times more. However the bubble's area looks roughly 10x as big as the first one, clearly not 100x as big.



Is this the expected behavior? Is this a bug?





EDIT: I think this is a bug. I get very different results depending on the base radius I set for the bubbles. When I set a base radius of 10, I get same-sized bubbles:





Whereas when I set a very small OR very large base radius, I get more differentiated (and realistic) results. Here is for base radius = 1:





I am using Dataiku 4.1.0. The same behavior can be observed on tutorial 103's customers_orders_joined dataset for base radius = 10:



0 Kudos
1 Solution
Clément_Stenac
Dataiker
Hi,

We agree that, while not "buggy" per se, the current behavior is a bit wonky.

The idea is that we compute the size so that the surface matches the value, but we then remap it to a limited range of values (both in order to avoid having insanely huge or invisible, bubbles, and because anyway, radius has to be done with an integer number of pixels.

What's wonky and that we'll probably change is the way the limited range is computed:

* If base radius is below 10, it's between 1 and base radius
* If base radius is above 10, it's between 10 and base radius

So indeed ... If base radius is 10, it's between 10 and 10.

With base radius 1, the size of your circles is actually correct. Meaning that the big one is "300 times bigger" than the small one, but since radius has to be rounded to an integer and there are only 10 possible values, the rounding cuts its down to only 10 times bigger.

We'll have a look at improving that.

View solution in original post

2 Replies
Clément_Stenac
Dataiker
Hi,

We agree that, while not "buggy" per se, the current behavior is a bit wonky.

The idea is that we compute the size so that the surface matches the value, but we then remap it to a limited range of values (both in order to avoid having insanely huge or invisible, bubbles, and because anyway, radius has to be done with an integer number of pixels.

What's wonky and that we'll probably change is the way the limited range is computed:

* If base radius is below 10, it's between 1 and base radius
* If base radius is above 10, it's between 10 and base radius

So indeed ... If base radius is 10, it's between 10 and 10.

With base radius 1, the size of your circles is actually correct. Meaning that the big one is "300 times bigger" than the small one, but since radius has to be rounded to an integer and there are only 10 possible values, the rounding cuts its down to only 10 times bigger.

We'll have a look at improving that.
johan
Level 1
Author
Thank you for your answer.

"If base radius is below 10, it's between 1 and base radius" > I don't think that's true. It looks to me like the size of the bubbles is always between the base radius and 10, either way.

I wholeheartedly approve of changing this behavior. I think it is both unnecessary and a pretty terrible think to do to a data visualization, because it basically renders it meaningless. I mean, would you silently clip or "rebalance" a bar chart just to make it look better?

And I think it is unnecessary because as a user it is pretty easy for me to add a column with the data converted to a log scale or something; the interface is pretty good that way.
0 Kudos