|
The performance of full-featured ray tracers has historically been
limited by the hardware's floating point computational power.
However, next generation multi-threaded multi-core architectures
promise to provide sufficient CPU power to support real time frame
rates. In such systems, the emerging problem will be limited
memory system performance in terms of both on-chip cache and
DRAM-to-cache bandwidth. This paper presents a novel ray tracing
algorithm that significantly improves both cache utilization and
DRAM-to-cache bandwidth. The key insight is to view ray traversal
as a scheduling problem, which allows our algorithm to match ray
traversal computations and intersection computations with available
system resources. Using a detailed simulator, we show that our
algorithm reduces the amount of geometry brought into the cache
by up to 32× for primary rays and up to 60× for shadow rays,
in exchange for the small overhead of maintaining the ray schedule.
Moreover, our algorithm creates units of work that are more
amenable to parallelization than traditional Whitted-style ray tracers.
|