The Algorithm of Auto-Layout for Unit Visualization

#
What is unit visualization?

Unit visualization is a technique used to efficiently arrange and display individual data units. It is widely used in data visualization, such as in heatmaps, scatter plots, and tree maps.

Here is an example of unit visualization for Gun Deaths in America:

Unit Visualization Example

A more complex example is the video below, which is a visualization for World War II.

#
Problem definition: the goal of auto-layout algorithm

#
The naive definition

Now, let's define the problem more formally.

We have a box with a fixed size width $w$ and height $h$ .
We want to put $n$ data points (units) into this box.
We want there are gaps between units. Since we don't know the width and height of the units, so we identify the gap as ratio $r_x$ and $r_y$ to the width and height of the box. Therefore, $\text{gap}_x = r_x * x$ and $\text{gap}_y = r_y * y$ . where $x$ and $y$ are the width and height of the units.
We want to calculate the width and height of the units so that the units fill the entire box, with the outermost units touching the edges of the box, and only allowing the last column of units to be incomplete.

For example, the following figure shows the problem when $n = 46$ , $w = \text{box width}$ , $h = \text{box height}$ , $r_x = 0.5$ , and $r_y = 1$ . The algorithm should calculate the width and height of the units so that we can get the following visualization.

Problem Definition

#
Fix the naive definition

However, do you believe the previous problem always has a solution? The answer is no. Because if we make the width of the box a little bit larger, but not large enough to put another unit, then the $gap_x$ will not be exactly $r_x * x$ .

Therefore, we need to introduce a new variable called $\text{offset}_x$ , which is the additional horizontal space between two units. Then we can make sure the problem always has a solution. And the final point of the problem definition becomes:

We want to calculate the width and height of the units so that the units fill the entire box with $\text{offset}_x$ . The outermost units should touch the edges of the box. And we only allow the last column of units to be incomplete.

Problem Definition with Offset

Therefore, our goal is not only to calculate the width and height of the units, but also to make $\text{offset}_x$ as small as possible.

#
Why we don't need $\text{offset}_y$

Theoretically, we can have $\text{offset}_y$ and this make the algorithm more complex. But in practice, we make that $\text{offset}_y$ is always 0. and use $\text{offset}_x$ to solve the issue that sometimes the gap cannot be exactly we defined.

#
The proof of the algorithm

#
The constraints

By the previous definition, we can get the following equations:

\begin{equation} n_{\text{col}} x + \left(n_{\text{col}} - 1\right) \cdot \left(x r_x + \text{offset}_x\right) = w \end{equation}

\begin{equation} n_{\text {row }} y+\left(n_{\text {row }}-1\right) \cdot y r_y=h \end{equation}

\begin{equation} n_{\text{col}} n_{\text{row}} \geq n \end{equation}

The previous three equations are all the constraints of the problem. Therefore, our goal is to solve them and make $\text{offset}_x$ as small as possible.

If we find the value of one of $n_{\text{col}}$ and $n_{\text{row}}$ , we know how to put the units into the box. Then we can calculate $x$ , $y$ , and $\text{offset}_x$ .

#
Find range of $n_{\text{row}}$

From the equation (1), we can get

\begin{equation} n_{\text{col}} x + (n_{\text{col}} - 1) \cdot x r_x \leq w \end{equation}

Let's define the aspect ratio $k$ of the units that $k = \frac{x}{y}$

Combine the equation (2) and inequality (4), we can get

\begin{equation} n_{\text{col}} \leq \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}} \end{equation}

Please note that $n_{\text{col}}$ is a positive integer, so the inequality (5) should be stricter as:

\begin{equation} n_{\text{col}} \leq \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}} - \epsilon \end{equation}

Where $\epsilon$ is a small number in range of $[0, 1)$ to make sure $n_{\text{col}}$ is an integer.

Combine this with the inequality (3), we can get the following equation:

\begin{equation} \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}}^2 \geq n + \epsilon n_{\text{row}} \end{equation}

Because $n + \epsilon n_{\text{row}} \geq n$

\begin{equation} \frac{(w + k y r_x)(1 + r_y)}{(h + y r_y)(k + k r_x )} n_{\text{row}}^2 \geq n \end{equation}

Inequality (8) is a looser constraint compared to inequality (7), which means the solution (the range of $n_{\text{row}}$ ) of inequality (7) should be a subset of the solution of inequality (8).

Because we do not know $\epsilon$ , to solve inequality (7), the only way is to solve inequality (8) first, which can narrow down the range of $n_{\text{row}}$ .

To solve (8), only $y$ is a unknown variable and all other variables are constants. And from equation (2), we can transfer $y$ to function of $n_{\text{row}}$ .

\begin{equation} y = \frac{h}{n_{\text{row}}(1+ r_y) - r_y} \end{equation}

Substituting this expression into the inequality (8), we finally can get the inequality for $n_{\text{row}}$ :

\begin{equation} n_{\text{row}} \left[ w(1+r_y)^2 n_{\text{row}}^2 + (k h r_x - w r_y) n_{\text{row}} - n h k (1 + r_x) (1 + r_y) \right] \geq 0 \end{equation}

Now, we can find the roots of the previous cubic equation.

Obviously, there is a root $n_{\text{row}} = 0$ . And the other two roots can be found by solving the following quadratic equation:

\begin{equation} w(1+r_y)^2 n_{\text{row}}^2 + (k h r_x - w r_y) n_{\text{row}} - n h k (1 + r_x) (1 + r_y) = 0 \end{equation}

\begin{align*} a &= w(1+r_y)^2 > 0 \\ b &= (k h r_x - w r_y) \\ c &= -n h k (1 + r_x) (1 + r_y) < 0 \end{align*}

The two roots are:

\begin{equation*} n_{\text{row}} = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \end{equation*}

Note that $-4ac > 0$ , so $\sqrt{b^2 - 4ac} > \left| b \right|$ . Therefore

\begin{align*} -b + \sqrt{b^2 - 4ac} &> 0 \\ -b - \sqrt{b^2 - 4ac} &< 0 \end{align*}

Therefore, the two roots are on the left side and the right side of 0. Also because $a > 0$ , we can draw the cubic function:

\begin{equation*} f(n_{\text{row}}) = n_{\text{row}} \left[ w(1+r_y)^2 n_{\text{row}}^2 + (k h r_x - w r_y) n_{\text{row}} - n h k (1 + r_x) (1 + r_y) \right] \end{equation*}

The cubic function

Because $f(n_{\text{row}}) \geq 0$ and $n_{\text{row}}$ is a positive integer.

\begin{equation*} n_{\text{row}} \geq \left\lceil \frac{-b + \sqrt{b^2 - 4ac}}{2a} \right\rceil \end{equation*}

Here is the code for finding the lower bound of $n_{\text{row}}$ :

1function getMinNRow(
2  box: [number, number],
3  aspectRatio: number,
4  gapRatio: [number, number],
5  n: number
6): number {
7  const [w, h] = box;
8  const a = Math.pow(w * (1 + gap[1]), 2);
9  const b = gap[0] * aspectRatio * h - w * gap[1];
10  const c = -n * h * aspectRatio * (1 + gapRatio[0]) * (1 + gapRatio[1]);
11  const delta = Math.sqrt(b * b - 4 * a * c);
12  return Math.ceil((-b + delta) / (2 * a));
13}

#
Find exact $n_{\text{row}}$

Because $n_{\text{row}} \geq \left\lceil \frac{-b + \sqrt{b^2 - 4ac}}{2a} \right\rceil$ is only the solution for the inequality (8), which means the $n_{\text{row}}$ satisfying this condition may not be the solution for the inequality (7). We still need to find the solution for the inequality (7) that minimize $\text{offset}_x$ .

The idea to find the best $n_{\text{row}}$ that minimize $\text{offset}_x$ is to

Enumerate all possible $n_{\text{row}}$ from the lower bound.
For each $n_{\text{row}}$ , we can calculate the $n_{\text{col}} = \left\lceil \frac{n}{n_{\text{row}}} \right\rceil$ and $x = k y = \frac{kh}{n_{\text{row}} (1 + r_y) - y}$ .
Test if $w\prime = n_{\text{col}} x + (n_{\text{col}} - 1) \cdot x r_x \leq w$ . If so, we find the best $n_{\text{row}}$ and return.

This means the first $n_{\text{row}}$ that satisfies this condition is the best $n_{\text{row}}$ . (Think about why)

Here is the code for finding the best $n_{\text{row}}$ :

1function getLayout(
2  minNRow: number,
3  count: number,
4  box: [number, number]
5  aspectRatio: number,
6  gapRatio: [number, number]
7) {
8  const [w, h] = box;
9
10  let NRow = minNRow;
11  let NCol;
12  let x;
13  let y;
14  let totalWidth;
15
16  do {
17    y = h / (NRow * (1 + gapRatio[1]) - gapRatio[1]);
18    x = aspectRatio * y;
19    NCol = Math.ceil(count / NRow);
20    totalWidth = NCol * x + (NCol - 1) * gapRatio[0] * x;
21  } while (totalWidth > w && NRow++);
22
23  return { NRow, NCol, y, x };
24}

#
Calculate the offset

After we get {NRow, NCol, y, x} by function getLayout, we can calculate the offset by:

\begin{equation*} \text{offset}_x = \frac{w - n_{\text{col}} x - (n_{\text{col}} - 1) \cdot x r_x}{n_{\text{col}} - 1} \end{equation*}

#
The edge cases

If $n > 0$ and the algorithm returns $n_{\text{col}} = 1$ , we should not use the offset and gap to position the units. Instead, we just put all units in a single column with equal space between each other.

#What is unit visualization?

#Problem definition: the goal of auto-layout algorithm

#The naive definition

#Fix the naive definition

#Why we don't need offsety\text{offset}_yoffsety​

#The proof of the algorithm

#The constraints

#Find range of nrown_{\text{row}}nrow​

#Find exact nrown_{\text{row}}nrow​

#Calculate the offset

#The edge cases