|
The performance of full-featured ray tracers has historically been
limited by the hardware\u2019s floating point computational power.
However, next generation multi-threaded multi-core architectures
promise to provide sufficient CPU throughput to support real time
frame rates. In such systems, limited memory system performance
in terms of both on-chip cache and DRAM-to-cache bandwidth is
likely to bound overall system performance. This paper presents
a novel ray tracing algorithm that both improves cache utilization
and reduces DRAM-to-cache bandwidth usage. The key insight is
to view ray traversal as a scheduling problem, which allows our algorithm
to match ray traversal computations and intersection computations
with available system resources. Using a detailed simulator,
we show that our algorithm significantly reduces the amount
of data brought into the cache in exchange for the small overhead
of maintaining the ray schedule. Moreover, our algorithm creates
units of work that are more amenable to parallelization than traditional
Whitted-style ray tracers.
|