Next: Acknowledgments Up: PLAPACK: High Performance through Previous: 5 Performance

6 Conclusion

In this paper, we have presented algorithms for the Cholesky, LU, and QR factorizations and shown how they can be implemented for distributed memory parallel architectures at a high level of abstraction. Although at this moment there is a considerable overhead for this high level abstraction, for large enough problems this is not as noticeable. Indeed, by implementing more ambitious algorithms, considerable performance gains can be made, when compared to more traditional approaches. We believe that by optimizing the underlying infrastructure, the overhead can be greatly reduced, and thus even for smaller problems high performance will be attained.

Next: Acknowledgments Up: PLAPACK: High Performance through Previous: 5 Performance

plapack@cs.utexas.edu