Quarterly Status Report

Performance Modeling

An Environment For End-to-End Performance Design of

Large Scale parallel Adaptive Computer/Communications Systems

for the period August 1st, 1999 to October 31st, 1999,

Contract N66001-97-C-8533

CDRL A001

 

1.0 Purpose of Report

This status report is the quarterly contract deliverable (CDRL A001), which summarizes the effort expended by the University of Texas, Austin team in support of Performance Modeling on Contract N66001-97-C-8533.

2. Project Members

University of Texas, spent: 420 hours

sub-contractor (Purdue), spent: 80 hours

sub-contractor (UT-El Paso), spent: 1,232 hours

sub-contractor (UCLA), spent: 696 hours

sub-contractor (Rice), spent: 433 hours

sub-contractor (Wisconsin), spent: 380 hours

sub-contractor (Los Alamos), spent: 0 hours

3.0 Project Description (last modified 07/97)

3.1 Objective

The goals of this project are: (1) to develop a comprehensive environment (POEMS) for end-to-end performance analysis of large, heterogeneous, adaptive, parallel/distributed computer and communication systems, and (2) to demonstrate the use of the environment in analyzing and improving the performance of defense-critical parallel and distributed systems.

3.2 Approach

The project combines innovations from a number of domains (communication, data mediation, parallel programming, performance modeling, software engineering, and CAD/CAE) to realize the goals. First, we will develop a specification language based on a general model of parallel computation with specializations to representation of workload, hardware and software. To enable direct use of programs as workload specifications, compilation environments such as dHPF will be adapted to generate executable models of parallel computation at specified levels of abstraction.

Second, we will experimentally and incrementally develop and validate scaleable models. This will involve using multi-scale models, multi-paradigm models, and parallel model execution in complementary ways. Multi-scale models will allow different components of a system to be modeled at varying levels of detail via the use of adaptive module interfaces, supported by the specification language. Multi-paradigm models will allow an analyst to use the modeling paradigm—analytical, simulation, or the software or hardware system itself—that is most appropriate with respect to the goals of the performance study. Integration of an associative model of communication with data mediation methods to provide adaptive component interfaces will allow us to compose disparate models in a common modeling framework. To handle computationally expensive simulations of critical subsystems in a complex system, we will incorporate parallel simulation technology based on the Maisie language.

Third, we will provide a library of models, at multiple levels of granularity, for modeling scaleable systems like those envisaged under the DOE ASCI program, and for modeling complex adaptive systems like those envisaged under the GloMo and Quorum

programs.

Finally, we will provide a knowledge base of performance data that can be used to predict the performance properties of standard algorithms as a function of architectural characteristics.

4.0 Performance Against Plan

4.1 Spending – Spending has caught up with plan. All of the subcontracts except for LANL are place.. The spending rate for the project will, after this quarter, run at about the planned rate.

4.2 Task Completion - A summary of the completion status of each task in the SOW is given following. Because several participants are involved in most tasks the assessment of completion for tasks in progress have some uncertainty in the estimates of completion. Assessments of task completions by participating institutions are given in the progress reports from each institution.

Task 1 - 95% Complete - Methodology development is an iterative process. One develops a version of the methodology, applies it and revises the methodology according to the success attained in the application. Evaluation of the methodology is in progress with the analysis of the performance of Sweep3D on the SP2 family of architectures. Closure will come with completion of Task 7 when validation of the methodology on the first end-to-end performance model has been completed.

Task 2 - Complete

Task 3 - 90% Complete - Specification languages for all three domains have been proposed and are in various states of completion.

Task 4 - 75% Complete - Task graphs can now be developed for most HPF programs and work on MPI programs is well underway.

Task 5 - 75% Complete - The compiler for the specification language is well into development. Use of the compilation methods developed for the CODE parallel programming system at UT-Austin has accelerated this task.

Task 6 - 55% Complete - The initial library of components has been specified and instantiation has begun. (See the progress reports from UTEP and Wisconsin for details.)

Task 7 - 40% Complete - Subtask or Phase 1 of this task is about 50% complete. (See the progress reports from UCLA and Wisconsin for details.)

Task 8 - 55% Complete

Task 9 – Task 9 has been partitioned into seven subtasks. Subtask 9.1 is complete and Subtask 9.2 is complete. Subtask 9.3 is complete. Tasks 9.4 is 40% complete, 9.5 is 25% complete and 9.6 is10% complete. Subtask 9.7 has not yet been initiated.

Task 10 - 0% Complete

Task 11 - 0% Complete

5.0 Major Accomplishments to Date

5.1 Project Management

a) Long Term Workplan

POEMS has generated the framework for end-to-end performance modeling and has developed initial versions of several major components. This year has been designated the "Year of Integration." The long-term goal for the year ending October 31, 1999 is integration of POEMS components. This will enable POEMS to spend the bulk of the third year of the project in application to further example systems.

5.2 Technical Accomplishments

  1. Knowledge Base
  2. Purdue reports the following items on development of the knowledge base.

    * The Ifestos system was renamed PYTHIA-II and became smoothly operating.

    * Journal paper describing the PYTHIA-II system was revised for publication.

    * A preliminary schema for the POEMS database was defined.

  3. Task Graph-MPISIM Integration

UCLA and Rice continued to work on evaluating the integrated dHPF / MPI-Sim system we have developed to facilitate simulation of systems with thousands of processors, and realistic problem sizes expected on such large systems.

The major accomplishment this quarter was to demonstrate the effectiveness

of the combined dHPF / MPI-Sim system. In particular, we obtained the

following experimental results:

* The integrated system is able to simulate *unmodified* High Performance Fortran (HPF) programs compiled to the Message-Passing Interface standard (MPI) by the dHPF compiler. In the future, we expect to simulate MPI programs as well.

* On three standard benchmarks and a wide range of problem and system

sizes, the optimized simulator has errors of less than 17% compared with

direct program measurement in all the cases we studied, and typically

much smaller errors.

* Furthermore, the optimized simulator requires factors of 5 to 2000 less

memory and up to a factor of 10 less time to execute than the original

simulator. These dramatic savings allow us to simulate systems and

problem sizes 10 to 100 times larger than is possible with the original

simulator

These results corroborate and further *strengthen* the earlier results we

had obtained via hand experiments before the compiler implementation was

performed. Those experiments had used manually modified versions of existing

message-passing programs. Overall, these results show that we have been able to improve the state of the art of simulation of parallel message-passing systems by more than one order of magnitude. The complete results have been included in the final version of the paper that will appear in the SC99 conference.

c. Development and Validation of the Hardware Domain Component Library

UTEP worked on validation of SimpleScalar against the R10000 and PowerPC604e by comparison of simulated execution times with actual execution times of the block of work was completed. Validation experiment results, as well as measurements of dilation, appear in the document entitled "Summary of SimpleScalar Experiments," which is updated as is necessary with new experimental results.

A study of Sweep3D's microarchitecture resource needs was completed by UTEP. This study focuses on the causes of stalls. It is the first step in investigating the performance of Sweep3D on next-generation processor architectures, which will lead into collaborative work with the University of Wisconsin- Madison and UCLA in terms of running Sweep3D on multiprocessors with next-generation processors. This work will pair the LogGP model with SimpleScalar simulation and MPI-SIM with SimpleScalar. The results of this study are presented in "Determination of Sweep3D's Reported Processor Utilization Using SimpleScalar Configured as a PowerPC 604e," a graduate project report by Jaideep Moses. A summary of these initial findings also is presented in "Summary of SimpleScalar Experiments."

UTEP continued a study of Sweep3D's memory resource needs. Initial results of this study also appear in "Summary of SimpleScalar Experiments."

UTEP continued work on a project that studies the accuracy of the R10000 performance counters.

d. Methodology Definition and Specification Language

UTEP and UT-Austin collaborated on developing an example of the methodology and specification language for TSEE paper. Work related to the design and development of the end-to-end performance model, which is in progress, will finalize this effort.

6. Artifacts Developed

6.1 Technical Papers

The final version of the paper "Compiler-Supported Simulation of Very Large Parallel

Applications," has been completed. The authors are: Vikram Adve, Rajive Bagrodia, Ewa Deelman, Thomas Phan and Rizos Sakellariou. The paper will appear in the Proceedings of Supercomputing'99, November 1999.

6.2 Software

A new MPI-SIM-AM version of the MPI-SIM simulator has been developed. It allows for the input of a MPI program in a simplified task graph form.

7.0 Issues

7.1 Open Issues with no Plan for Resolution

none

    1. Open Issues with Plan for Resolution
    2. Development of methodology examples.

      Completion of the PSL compiler.

    3. Issues Resolved

Integration of the MPI-SIM simulator with the output of the task graph compiler.

8.0 Near-term Plan

The near term plan focuses on completing components of POEMS and getting them ready for integration.

a. Knowledge-Based System

b. Models and Model Evaluation

UCLA and Rice will continue evaluation of the MPI-Sim/Task Graph model.

Rice and UT-Austin will continue our collaboration to interface the static and dynamic task graphs into the POEMS Specification Language Compiler.

Rice and UCLA will investigate how to build on the successful collaboration on compiler-supported simulation of message-passing programs, which are essentially a special case of distributed, network-intensive programs.

UTEP will distribute a new version of the "Hardware Domain Component Model Specification" document, which will address feedback from the UCLA and LANL Poets and will include the specification of the LLNL-SP/2 Power604e and a description of

SimpleScalar.

UTEP will use SimpleScalar to investigate the performance of Sweep3D on next-generation architectures and continue memory performance studies for Sweep3D.

UTEP will complete the project on evaluation of accuracy of the R10000 performance counters.

  1. Frameworks, Specification Language and Compilers

UTEP and UT-Austin will complete the methodology and Specification Language example for the TSE paper.

9.0 Completed Travel

* Travel of Houstis and Rice to conference (Second Japan Workshop

on Problem Solving Environments, Kanzawa, Japan)

*Vikram Adve, Rizos Sakellariou, John Rice, Jim Browne, Emery Berger, Ewa Deelman and Rajive Bagrodia attended the POEMS project group meeting held at UTEP on August 10, 1999.

10.0 Equipment

None Acquired

11.0 Summary of Activity

11.1 Work Focus:

The foci for activities are broken out by topic.

a) Knowledge Base

The Purdue focus is on:

* Completion of the data acquisition for Tasks 9.4 and 9.5.

* Plans for incorporating data into our knowledge base from

other POEMS investigators.

b. Models, Model Evaluation and Modeling

UCLA-Rice

Validation of the new MPI-Sim/Task Graph model: MPI-SIM-AM.

UTEP

* Hardware domain component model specification and implementation,

in particular, implementation of the processor/memory subsystem

component model/simulator of the LLNL-SP/2 Power604e and the SGI O2K

MIPS R10000.

* Interfacing of processor/memory subsystem simulation and MPI-SIM.

* Research w.r.t. next generation processors/memory systems and microarchitecture performance analysis.

  1. Framework – Methodology, Specification Language and Compiler

UT-Austin will work on completion of the PSL compiler.

UTEP and UT-Austin will work on completion of PSL examples.

11.2 – Significant Events

  1. Project Meeting
  2. The POEMS project group meeting was held on August 10, 1999.

    This meeting developed a detailed work plan for the coming year.

    The key points of the work plan include:

    (a) plans for the completion and submission of task reports;

    (b) a plan for addressing the integration issues that arise in

    integrating the various components of the POEMS system; and

    (c) a plan to initiate development of the POEMS performance

    model database.

  3. Knowledge Base

* PYTHIA-II is running smoothly.

c. Models, Model Evaluation and Modeling

UCLA-Rice

The paper entitled "Compiler-Supported Simulation of Very Large Parallel Applications," with authors Vikram Adve, Rajive Bagrodia, Ewa Deelman, Thomas Phan

and Rizos Sakellariou was accepted to be presented at Supercomputing'99.

The phenomenal benefits of the compiler-supported simulation techniques in the integrated dHPF / MPI-Sim approach reported in this paper imply that we have effectively improved the state of the art of simulation of parallel message-passing systems by more than one order of magnitude.

d. Personnel Change

Dr. Vikram Adve has left Rice University to take up a faculty position at the University of Illinois. He will continue to be an active principal investigator on the POEMS project, working from Illinois. The subcontract to Rice University is being terminated effective August 1999, following Dr. Adve's departure (since he was the lead investigator for this effort at Rice).

FINANCIAL INFORMATION:

Contract #: N66001-97-C-8533

Contract Period of Performance: 7/24/97-7/23/00

Ceiling Value: $1,839,517

Reporting Period: 8/1/99-10/31/99

Actual Vouchered (all costs to be reported as fully burdened, do not report

overhead, GA and fee separately):

Current Period

Prime Contractor Hours Cost

Labor 420 10,328.89

ODC's 36,289.91

Sub-contractor 1 (Purdue) 80 5,711.41

Sub-contractor 2 (UT-El Paso) 1,232 20,933.86

Sub-contractor 3 (UCLA) 696 45,653.95

Sub-contractor 4 (Rice) 433 21,793.60

Sub-contractor 5 (Wisconsin) 380 42,142.85

Sub-contractor 6 (Los Alamos) 0 0.00

TOTAL: 3,241 182,854.47

 

Prime Contractor Hours Cost

Labor 8,645 264,494.15

ODC's 348,251.93

Sub-contractor 1 (Purdue) 1,288 100,944.22

Sub-contractor 2 (UT-El Paso) 4,578 169,348.92

Sub-contractor 3 (UCLA) 3,372 180,963.42

Sub-contractor 4 (Rice) 3,281 186,276.06

Sub-contractor 5 (Wisconsin) 2,038 128,201.25

Sub-contractor 6 (Los Alamos) 0 0.00

TOTAL: 23,202 1,378,479.95