Addressing

The addressing of data in a bricked volume layout is more costly than in a linear volume layout. To address one data element, one has to address the block itself and the element within the block. In contrast to this addressing scheme, a linear volume can be seen as one large block. To address a sample it is enough to compute just one offset. In algorithms like volume raycasting, which need to access a certain neighborhood of data in each processing step, the computation effort for two offsets instead of one generally cannot be neglected. In a linear volume layout, the offsets to neighboring samples are constant. Using bricking, the whole address computation would have to be performed for each neighboring sample that has to be accessed. To avoid this performance penalty, one can construct an if-else statement. The if-clause consists of checking if the needed data elements can be addressed within one block. If the outcome is true, the data elements can be addressed as fast as in a linear volume. If the outcome is false, the costly address calculations have to be done. This simplifies address calculation, but the involved if-else statement incurs pipeline flushes.

We therefore applied a different approach. We distinguished the possible sample positions by the locations of the needed neighboring samples. The first sample location $(i, j, k)$ is defined by the integer parts of the current resample position. Assuming trilinear interpolation, during resampling neighboring samples to the right, top, and back of the current location are required. A block can be subdivided into subsets. For each subset, we can determine the blocks in which the neighboring samples lie. Therefore, it is possible to store these offsets in a lookup table.

The lookup table contains $8 \cdot 7 = 56$ offsets. We have eight cases, and for each sample $(i, j, k)$ we need the offsets to its seven adjacent samples. The seven neighbors are accessed relative to the sample $(i, j, k)$. Since each offset consists of four bytes the table size is 224 bytes. The basic idea is to extract the eight cases from the current resample position and create an index into a lookup table, which contains the offsets to the neighboring samples.

The input parameters of the lookup table addressing function are the sample position $(i, j, k)$ and the block dimensions $B_x$, $B_y$, and $B_z$. We assume that the block dimensions are a power of two, i.e., $B_x = 2^{N_x}$, $B_y = 2^{N_y}$, and $B_z = 2^{N_z}$. As a first step, the block offset part from $i$, $j$, and $k$ is extracted by a conjunction with the corresponding $B_{\{x, y, z\}}-1$. The next step is to increase all by one to move the maximum possible value of $B_{\{x, y, z\}}-1$ to $B_{\{x,y,z\}}$. All the other possible values stay within the range $[1,B_{\{x, y, z\}}-1]$. Then a conjunction of the resulting value and the complement of $B_{\{x, y, z\}}-1$ is performed, which maps the input values to $[0, B_{\{x, y, z\}}]$. The last step is to add the three values and divide the result by the minimum of the block dimensions, which maps the result to [0,7]. This last division can be exchanged by a shift operation. In summary, the lookup table index for a position $(i, j, k)$ is given by:


\begin{displaymath}
\begin{array}{rcl}
i' & = & (( i \;\&\; (B_x - 1)) + 1 ) \;...
...x & = & (i' + j' + k') \gg \min (N_x ,N_y ,N_z) \\
\end{array}\end{displaymath} (6.1)

We use $\&$ to denote a bitwise and operation, $\vert$ to denote a bitwise or operation, $\gg$ to denote a right shift operation, and $\sim$ to denote a bitwise negation.

A similar approach can be done for the gradient computation. We presented a general solution for a 26-connected neighborhood. Here we can, analogous to the resample case, distinguish 27 cases. The first step is to extract the block offset, by a conjunction with $B_{\{x, y, z\}}-1$. Then we subtract one, and conjunct with $B_{\{x,y,z\}} + B_{\{x,y,z\}} - 1$, to separate the case if one or more components are zero. In other words, zero is mapped to $2 \cdot B_{\{x,y,z\}}-1$. All the other values stay within the range $[0,B_{\{x,y,z\}}-2]$. To separate the case of one or more components being $B_{\{x, y, z\}}-1$, we add 1, after the previous subtraction is undone by a disjunction with 1, without loosing the separation of the zero case. Now all the cases are mapped to $\{0,1,2\}$ to obtain a ternary system. This is done by dividing the components by the corresponding block dimensions. These divisions can be replaced by faster shift operations. Then the three ternary variables are mapped to an index in the range of $[0,26]$. In summary, the lookup table index computation for a position $(i, j, k)$ is:


\begin{displaymath}
\begin{array}{rcl}
i' & = & (((((i \;\&\; (B_x-1)) - 1) \;\&...
...ert\; 1) + 1) \gg N_z\\
index & = & 9i' + 3j' + k'
\end{array}\end{displaymath} (6.2)

The presented index computations can be performed efficiently on current CPUs, since they only consist of simple bit manipulations. The lookup tables can be used in raycasting on a bricked volume layout for efficient access to neighboring samples. The first table can be used if only the eight samples within a cell have to be accessed (e.g., if gradients have been pre-computed). The second table allows full access to a 26-neighborhood. Compared to the if-else solution which has the costly computation of two offsets in the else branch, we get a speedup of about 30%. The benefit varies, depending on the block dimensions. For a $32 \times 32 \times 32$ block size the else-branch has to be executed in 10% of the cases and for a $16 \times 16 \times 16$ block size in 18% of the cases.