Memory Efficient Acceleration Data Structures

To demonstrate the impact of our high-level optimizations we used a commodity notebook system equipped with an Intel Centrino 1.6 GHz CPU, 1 MB level 2 cache, and 1 GB RAM. This system has one CPU and does not support Hyper-Threading, so the presented results only reflect performance increases due to our high-level acceleration techniques.

The memory consumption of the gradient cache is not related to the volume dimensions, but determined by the fixed block size. We use $32 \times 32 \times 32$ blocks, the size of the gradient cache therefore is is $(33)^3 \cdot 3 \cdot 4$ byte $\approx$ 422 KB. Additionally we store for each cache entry a validity bit, which adds up to $33^3 / 8$ byte $\approx$ 4.39 KB.

Figure 5.4 shows the effect of per block gradient caching compared to per cell gradient caching and no gradient caching at all. Per cell gradient caching means that gradients are reused for multiple resample locations within a cell. We chose an adequate opacity transfer function to enforce translucent rendering. The charts from left to right show different timings for object sample distances from 1.0 to 0.125 for three different zoom factors 0.5, 1.0, and 2.0. In case of zoom factor 1.0 we have one ray per cell, already here per block gradient caching performs better than per cell gradient caching. This is due to the shared gradients between cells. For zooming out (0.5) both gradient caching schemes perform equally well. The rays are so far apart such that nearly no gradients can be shared. On the other hand, for zooming in (2.0), per block caching performs much better than per cell caching. This is due to the increased number of rays per cell. For this zoom factor, per block gradient caching achieves a speedup of approximately 3.0 compared to no gradient caching at a typical object sample distance of 0.5.

Figure 5.4: Comparison of different gradient caching strategies
\includegraphics{results/images/gradientcache.eps}

The additional memory usage of the acceleration data structures is rather low. The cell invisibility cache has a size of $32^3$ bit = 4096 byte. The min-max octree has a depth of three storing 4 byte at each node (a 2 byte minimum and maximum value) and requires at most 2340 byte. Additionally, the classification information is stored, which requires 66 byte. We use blocks of size $32 \times 32 \times 32$ storing 2 bytes for each sample, which is a total of 65536 bytes. Our data structures increase the total memory requirements by approximately 10%.

Figure 5.5 compares our acceleration techniques for three large medical datasets. In the fourth column of the table, the render times for entry point determination using block granularity is displayed. Column five shows the render times for octree based entry point determination. In the fifth column, the render times for octree based entry point determination plus cell invisibility caching are displayed. Typically, about 2 frames per second are achieved for these large data sets.

Figure 5.5: Acceleration techniques tested on different datasets. Column four lists the render times for entry point determination at block level. The fifth column gives the render times for entry point determination using octree projection. The last column lists render times for octree projection plus additional cell invisibility caching.
\includegraphics{results/images/images.eps}



Image Dimensions Size Block Octree Cell
(a) $587 \times 341 \times 1878$ 0.70 GB 0.61 s 0.46 s 0.40 s
(b) $587 \times 341 \times 1878$ 0.70 GB 0.68 s 0.53 s 0.45 s
(c) $512 \times 512 \times 1112$ 0.54 GB 1.16 s 0.93 s 0.61 s
(d) $512 \times 512 \times 1202$ 0.59 GB 0.86 s 0.70 s 0.64 s
(e) $512 \times 512 \times 1202$ 0.59 GB 0.69 s 0.46 s 0.37 s