The Algorithm of Auto-Layout for Unit Visualization

What is unit visualization?

Unit visualization is a technique used to efficiently arrange and display individual data units. It is widely used in data visualization, such as in heatmaps, scatter plots, and tree maps.

Here is an example of unit visualization for Gun Deaths in America:

Unit Visualization Example

Unit Visualization Example

A more complex example is the video below, which is a visualization for World War II.

Problem definition: the goal of auto-layout algorithm

The naive definition

Now, let's define the problem more formally.

  1. We have a box with a fixed size width ww and height hh.
  2. We want to put nn data points (units) into this box.
  3. We want there are gaps between units. Since we don't know the width and height of the units, so we identify the gap as ratio rxr_x and ryr_y to the width and height of the box. Therefore, gapx=rxx\text{gap}_x = r_x * x and gapy=ryy\text{gap}_y = r_y * y. where xx and yy are the width and height of the units.
  4. We want to calculate the width and height of the units so that the units fill the entire box, with the outermost units touching the edges of the box, and only allowing the last column of units to be incomplete.

For example, the following figure shows the problem when n=46n = 46, w=box widthw = \text{box width}, h=box heighth = \text{box height}, rx=0.5r_x = 0.5, and ry=1r_y = 1. The algorithm should calculate the width and height of the units so that we can get the following visualization.

Problem Definition

Problem Definition

Fix the naive definition

However, do you believe the previous problem always has a solution? The answer is no. Because if we make the width of the box a little bit larger, but not large enough to put another unit, then the gapxgap_x will not be exactly rxxr_x * x.

Therefore, we need to introduce a new variable called offsetx\text{offset}_x, which is the additional horizontal space between two units. Then we can make sure the problem always has a solution. And the final point of the problem definition becomes:

We want to calculate the width and height of the units so that the units fill the entire box with offsetx\text{offset}_x. The outermost units should touch the edges of the box. And we only allow the last column of units to be incomplete.

Problem Definition with Offset

Problem Definition with Offset

Therefore, our goal is not only to calculate the width and height of the units, but also to make offsetx\text{offset}_x as small as possible.

Why we don't need offsety\text{offset}_y

Theoretically, we can have offsety\text{offset}_y and this make the algorithm more complex. But in practice, we make that offsety\text{offset}_y is always 0. and use offsetx\text{offset}_x to solve the issue that sometimes the gap cannot be exactly we defined.

The proof of the algorithm

The constraints

By the previous definition, we can get the following equations:

ncolx+(ncol1)(xrx+offsetx)=w\begin{equation} n_{\text{col}} x + \left(n_{\text{col}} - 1\right) \cdot \left(x r_x + \text{offset}_x\right) = w \end{equation}
nrow y+(nrow 1)yry=h\begin{equation} n_{\text {row }} y+\left(n_{\text {row }}-1\right) \cdot y r_y=h \end{equation}
ncolnrown\begin{equation} n_{\text{col}} n_{\text{row}} \geq n \end{equation}

The previous three equations are all the constraints of the problem. Therefore, our goal is to solve them and make offsetx\text{offset}_x as small as possible.

If we find the value of one of ncoln_{\text{col}} and nrown_{\text{row}}, we know how to put the units into the box. Then we can calculate xx, yy, and offsetx\text{offset}_x.

Find range of nrown_{\text{row}}

From the equation (1), we can get

ncolx+(ncol1)xrxw\begin{equation} n_{\text{col}} x + (n_{\text{col}} - 1) \cdot x r_x \leq w \end{equation}

Let's define the aspect ratio kk of the units that k=xyk = \frac{x}{y}

Combine the equation (2) and inequality (4), we can get

ncol(w+kyrx)(1+ry)(h+yry)(k+krx)nrow\begin{equation} n_{\text{col}} \leq \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}} \end{equation}

Please note that ncoln_{\text{col}} is a positive integer, so the inequality (5) should be stricter as:

ncol(w+kyrx)(1+ry)(h+yry)(k+krx)nrowϵ\begin{equation} n_{\text{col}} \leq \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}} - \epsilon \end{equation}

Where ϵ\epsilon is a small number in range of [0,1)[0, 1) to make sure ncoln_{\text{col}} is an integer.

Combine this with the inequality (3), we can get the following equation:

(w+kyrx)(1+ry)(h+yry)(k+krx)nrow2n+ϵnrow\begin{equation} \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}}^2 \geq n + \epsilon n_{\text{row}} \end{equation}

Because n+ϵnrownn + \epsilon n_{\text{row}} \geq n

(w+kyrx)(1+ry)(h+yry)(k+krx)nrow2n\begin{equation} \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}}^2 \geq n \end{equation}

Inequality (8) is a looser constraint compared to inequality (7), which means the solution (the range of nrown_{\text{row}}) of inequality (7) should be a subset of the solution of inequality (8).

Because we do not know ϵ\epsilon, to solve inequality (7), the only way is to solve inequality (8) first, which can narrow down the range of nrown_{\text{row}}.

To solve (8), only yy is a unknown variable and all other variables are constants. And from equation (2), we can transfer yy to function of nrown_{\text{row}}.

y=hnrow(1+ry)ry\begin{equation} y = \frac{h}{n_{\text{row}}(1+ r_y) - r_y} \end{equation}

Substituting this expression into the inequality (8), we finally can get the inequality for nrown_{\text{row}}:

nrow[w(1+ry)2nrow2+(khrxwry)nrownhk(1+rx)(1+ry)]0\begin{equation} n_{\text{row}} \left[ w(1+r_y)^2 n_{\text{row}}^2 + (k h r_x - w r_y) n_{\text{row}} - n h k (1 + r_x) (1 + r_y) \right] \geq 0 \end{equation}

Now, we can find the roots of the previous cubic equation.

Obviously, there is a root nrow=0n_{\text{row}} = 0. And the other two roots can be found by solving the following quadratic equation:

w(1+ry)2nrow2+(khrxwry)nrownhk(1+rx)(1+ry)=0\begin{equation} w(1+r_y)^2 n_{\text{row}}^2 + (k h r_x - w r_y) n_{\text{row}} - n h k (1 + r_x) (1 + r_y) = 0 \end{equation}
a=w(1+ry)2>0b=(khrxwry)c=nhk(1+rx)(1+ry)<0\begin{align*} a &= w(1+r_y)^2 > 0 \\ b &= (k h r_x - w r_y) \\ c &= -n h k (1 + r_x) (1 + r_y) < 0 \end{align*}

The two roots are:

nrow=b±b24ac2a\begin{equation*} n_{\text{row}} = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \end{equation*}

Note that 4ac>0-4ac > 0, so b24ac>b\sqrt{b^2 - 4ac} > \left| b \right|. Therefore

b+b24ac>0bb24ac<0\begin{align*} -b + \sqrt{b^2 - 4ac} &> 0 \\ -b - \sqrt{b^2 - 4ac} &< 0 \end{align*}

Therefore, the two roots are on the left side and the right side of 0. Also because a>0a > 0, we can draw the cubic function:

f(nrow)=nrow[w(1+ry)2nrow2+(khrxwry)nrownhk(1+rx)(1+ry)]\begin{equation*} f(n_{\text{row}}) = n_{\text{row}} \left[ w(1+r_y)^2 n_{\text{row}}^2 + (k h r_x - w r_y) n_{\text{row}} - n h k (1 + r_x) (1 + r_y) \right] \end{equation*}
The cubic function

The cubic function

Because f(nrow)0f(n_{\text{row}}) \geq 0 and nrown_{\text{row}} is a positive integer.

nrowb+b24ac2a\begin{equation*} n_{\text{row}} \geq \left\lceil \frac{-b + \sqrt{b^2 - 4ac}}{2a} \right\rceil \end{equation*}

Here is the code for finding the lower bound of nrown_{\text{row}}:

1function getMinNRow(
2 box: [number, number],
3 aspectRatio: number,
4 gapRatio: [number, number],
5 n: number
6): number {
7 const [w, h] = box;
8 const a = Math.pow(w * (1 + gap[1]), 2);
9 const b = gap[0] * aspectRatio * h - w * gap[1];
10 const c = -n * h * aspectRatio * (1 + gapRatio[0]) * (1 + gapRatio[1]);
11 const delta = Math.sqrt(b * b - 4 * a * c);
12 return Math.ceil((-b + delta) / (2 * a));
13}

Find exact nrown_{\text{row}}

Because nrowb+b24ac2an_{\text{row}} \geq \left\lceil \frac{-b + \sqrt{b^2 - 4ac}}{2a} \right\rceil is only the solution for the inequality (8), which means the nrown_{\text{row}} satisfying this condition may not be the solution for the inequality (7). We still need to find the solution for the inequality (7) that minimize offsetx\text{offset}_x.

The idea to find the best nrown_{\text{row}} that minimize offsetx\text{offset}_x is to

  1. Enumerate all possible nrown_{\text{row}} from the lower bound.
  2. For each nrown_{\text{row}}, we can calculate the ncol=nnrown_{\text{col}} = \left\lceil \frac{n}{n_{\text{row}}} \right\rceil and x=ky=khnrow(1+ry)yx = k y = \frac{kh}{n_{\text{row}} (1 + r_y) - y}.
  3. Test if w=ncolx+(ncol1)xrxww\prime = n_{\text{col}} x + (n_{\text{col}} - 1) \cdot x r_x \leq w. If so, we find the best nrown_{\text{row}} and return.

This means the first nrown_{\text{row}} that satisfies this condition is the best nrown_{\text{row}}. (Think about why)

Here is the code for finding the best nrown_{\text{row}}:

1function getLayout(
2 minNRow: number,
3 count: number,
4 box: [number, number]
5 aspectRatio: number,
6 gapRatio: [number, number]
7) {
8 const [w, h] = box;
9
10 let NRow = minNRow;
11 let NCol;
12 let x;
13 let y;
14 let totalWidth;
15
16 do {
17 y = h / (NRow * (1 + gapRatio[1]) - gapRatio[1]);
18 x = aspectRatio * y;
19 NCol = Math.ceil(count / NRow);
20 totalWidth = NCol * x + (NCol - 1) * gapRatio[0] * x;
21 } while (totalWidth > w && NRow++);
22
23 return { NRow, NCol, y, x };
24}

Calculate the offset

After we get {NRow, NCol, y, x} by function getLayout, we can calculate the offset by:

offsetx=wncolx(ncol1)xrxncol1\begin{equation*} \text{offset}_x = \frac{w - n_{\text{col}} x - (n_{\text{col}} - 1) \cdot x r_x}{n_{\text{col}} - 1} \end{equation*}

The edge cases

If n>0n > 0 and the algorithm returns ncol=1n_{\text{col}} = 1, we should not use the offset and gap to position the units. Instead, we just put all units in a single column with equal space between each other.