Memory Management for Large Datasets

For a comparison of bricked and linear volume layouts, we use a Dual Intel Pentium Xeon 2.4 GHz equipped with 512 KB level-2 cache, 8 KB level-1 data cache, and 1 GB of Rambus memory.

In our system, we are able to support different block sizes, as long as each block dimension is a power of two. If we set the block size to the actual volume dimensions, we have a common raycaster which operates on a simple linear volume layout. This enables us to make a meaningful comparison between a raycaster which operates on simple linear volume layout and a raycaster which operates on a bricked volume layout. To underline the effect of bricking we benchmarked different block sizes. Figure 5.1 shows the actual speedup achieved by blockwise raycasting. For testing, we specified a translucent transfer-function, such that the impact of all high level optimizations was overridden. In other words, the final image was the result of brute-force raycasting of the whole data. The size of the dataset had no influence on the actual optimal gains.

Furthermore, we did a worst-case comparison with respect to the viewing direction. In case of small blocks the worst case is similar to the best case. In contrast to that, using large bricks shows enormous performance decreases depending on the viewing direction. This is the well known fact of view-dependent performance of a raycaster operating on a linear volume layout. The constant performance behavior of small blocks is one of the main advantages of a bricked volume layout. There is nearly no view dependent performance variation anymore.

Going from left to right in the chart shown in Figure 5.1, first we have a speedup of about 2.0 with a block size of 1 KB. Increasing the block size up to 64 KB also increases the speedup. This is due to more efficient use of the cache. The chart shows an optimum at a block size of 64KB ( $32 \times 32 \times 32$ ) with a speedup of about 2.8. This number is the optimal tradeoff between the needed cache space for ray data structures, sample data, and lookup tables. Larger block sizes lead to performance decreases, as they are too large for the cache, but still suffer from the overhead caused by bricking. This performance drop-off is reduced, once the block size approaches the volume size. With only one volume-sized block, the rendering context corresponds to a common raycaster operating on a linear volume layout.

**Figure 5.1:** Block-based raycasting speedup compared to raycasting on a linear volume layout
$\includegraphics{results/images/blocksizesbenchmark.eps}$

up:	Results
previous:	Results
next:	Parallelization Strategies for Commodity Hardware