The Fast Fourier Transform (FFT) plays a key role in many areas of computational science and engineering. Although in most cases, 1-dimensional FFT problem sizes are small enough (at most 100,000 points or so) to allow them to be computed in memory, there is another class of problems for which the problem size can be so large that it exceeds the memory capacity of even large supercomputers.
In this talk, I will discuss how to perform 1-dimensional FFTs on distributed-memory multiprocessors when the data reside on a parallel disk system with independent disk accesses. We will examine several multiprocessor implementations, which differ in how the data are organized on the disks and in the distributed memory. They perform differing amounts of I/O and communication. Two of the methods have the remarkable property that all interprocessor communication occurs outside the mini-butterfly computations; communication that ordinarily occurs in a butterfly is folded into other data-movement operations that are performed anyway. Performance results indicate that these new methods are usually faster. Moreover, they are much easier to implement.
Because the FFT computation is deterministic and oblivious, if we know the interprocessor communication costs per message and per byte, an analysis program can quickly determine which method to use for a given problem size and machine configuration.
Joint work with David Nicol and Jake Wegmann.
Back to LESSLast modified: September 3, 1998