Reuse by Specialization through Views
Gordon S. Novak Jr.
Department of Computer Sciences
University of Texas at Austin
Austin, TX 78712
Tel: (512) 471-9569, Fax: (512) 471-8885
Our approach to software reuse is to specialize generic software components to work together and to meet the requirements of the application. We have developed a compiler that can specialize generic procedures through views that describe how application data types correspond to abstract types. The software components of an application constrain each other in many ways; often these constraints are not represented in machine-readable form and are known only to the programmer. We are developing representations for software frameworks, including constraints among components. By representing constraints explicitly, propagating them, and making inferences from them, we hope to make it easy to generate programs by composition and specialization of generic components.
Keywords: views, generic algorithms, specialization, program frameworks, constraint propagation
Workshop Goals: Promote software generation by specialization of generic procedure through views.
Working Groups: transformational program generation, views, software architecture.
We have developed a compiler that can specialize generic procedures through views. A view describes how a concrete type implements features of an abstract type, in terms of which generic procedures are written. Given appropriate views, any procedure associated with the abstract type can be specialized so that it operates directly on the application data. Recursive in-line expansion of view definitions allows a large expansion from source code to output code; partial evaluation makes the resulting code efficient. Output code can be produced in a variety of programming languages by mechanical translation. An Automatic Programming Server (APS) has been constructed that allows a user to obtain programs over the web. The user describes the application data structures, makes views using interactive tools provided by the APS, and requests procedures to be generated. Source code for the generated procedures, in the desired language, is delivered to the user as a web page at the end of the session. APS allows generation of hundreds of lines of code with a few minutes of user interaction.
If whole programs could be generated by a system similar to APS, it would be much easier for programmers to produce programs by reuse of specialized generic components. Making such a system usable requires reducing the amount of detail the programmer must specify. Constraint propagation and inference may allow many details to be specified automatically.
The real software problem is a communication problem: how to communicate to the computer the program that we want. Detail is the enemy. The more details the human programmer must specify, the more difficult it is to write programs and the more difficult it is to change programs that already exist. Reuse methods that require the user to learn the details of the software that is to be reused fall prey to detail in another way, since human learning is slow and costly. The history of the development of programming languages has been one of removing detail that the human must specify and letting the computer handle it. Effective reuse requires minimizing the amount of detail the human user must learn and specify, and automating the generation of software.
Software should be constructed primarily by composition of reusable components, with little hand coding. To be reusable, components must be generic: as [Biggerstaff94] noted, the more specific choices that are hard-coded in a component, the less reusable it becomes. However, it must be possible to specialize the generics to fit the application, and the resulting specialized programs must be efficient. Components of an application strongly constrain each other. While some constraints are enforced automatically (e.g. by type checking), knowledge of the constraints that exist and production of code that meets the constraints are usually left to the programmer. Programming is difficult because a great deal of detail must be managed.
In order to solve the programming problem, it is necessary to reduce the amount of detail that must be specified by the programmer and to minimize the learning required to use a set of components. Our research represents explicitly the architecture of a program framework, including components and the constraints they place on each other. Constraint propagation can allow a decision that is specified at one place to propagate to other parts of the specification without human intervention. Knowledge-based inference can make obvious decisions automatically (subject to retraction by the programmer if desired), and these can be propagated as well. A graphical interface is planned that will represent the components, connections, and the specification status of the components. Components will help the user by suggesting choices and helping to construct specifications, thus reducing the need for learning.
We have developed a compiler that can specialize generic procedures through views. A view is analogous to a wrapper or adapter object [Gamma95]: it defines the properties that a generic procedure expects to see (in its abstract type) in terms of the concrete type of the application. Views allow writing of data through the view as well as reading data, and they maintain invariants that guarantee that a generic procedure will work correctly through a view [Novak95]. Because the translation of an access through a view is compiled in-line and optimized by partial evaluation [Jones93], the use of views often has little or no runtime cost compared to ordinary programs.
Views can be complex and difficult to write by hand. We have developed tools that make it easy to make views interactively, using menu selections or correspondences shown on diagrams that represent abstract concepts. These tools are combined in an Automatic Programming Server (APS) that allows a user to obtain specialized programs (in several languages) via the web.
A programmer can get parts of an application at almost no cost by using APS. The programmer designs the data structures to be used, describes the data to APS, makes views, and then requests procedures to be specialized. Hundreds of lines of code can be obtained in a few minutes of interaction. However, the programmer is still responsible for knowing the overall structure of the system and for putting the pieces together. APS would be much more useful if it could represent the structure of a whole application and maintain the constraints among components.
Our current research is on program frameworks that represent software components and constraints between them. A simple framework is iterate-accumulate: an iterator steps through an input sequence, producing a sequence of items, and an accumulator accumulates the items in some data structure. This framework models a variety of programs, e.g. adding the elements of an array, averaging a list of numbers, performing numerical integration of a function, counting occurrences of words in a file, etc. Using views, a single generic program for iterate-accumulate can be specialized into any of these instances.
Our goal is to allow an instance of a program framework to be created from a minimal specification. By explicit representation of facts and constraints and the use of inference rules, a minimal specification can be expanded into a complete and consistent program specification. For example, suppose the user specifies that the iterate-accumulate framework is to be used and that the input is a list of integers. The system can infer that the default iterator is used, i.e. to generate the integers from the list in sequence; thus, the item type is integer. The default accumulator for integers is to add them, starting from an initial value of 0, resulting in an integer output. Thus, the minimal specification of the framework and input type is enough to allow a fully specified program to be generated. By such inference and propagation of specifications from one component of the framework to others across constraints, a complete specification can be derived from minimal user input.
Some inferred choices may not be what the programmer wants; in the above example, the programmer might want to add the squares of the odd integers in the list. Inferred decisions (and the programmer's decisions) must be retractable, with specifications that depended on them being re-derived. Our representation keeps track of dependencies, so that decisions and their consequences can be modified. The programmer retains the ability to specify the unusual case if desired, without having to specify the usual case.
Software components can help the programmer to specify them by actively presenting choices for parameter values or component specializations. For example, if it is known that the input to an accumulator is an integer, the known accumulators of integers can be suggested (e.g. sum, average, statistics, histogram). Individual components can have GUI's that present the parameters that can be specified and suggest values. With such techniques, it will not be necessary for a programmer to read or memorize a reference manual.
A well-designed graphical interface can reduce the amount of user input required and make the interface faster and easier to understand. We have developed a system, VIP, that allows scientific programs to be generated from connections of diagrams that represent physical and mathematical principles [Novak94a]. We expect to develop a similar interface for specification of software from connections of generic components.
Our goal should be to make programming orders of magnitude faster than it is now. Reuse, with automation of program generation, is the only way to achieve this. In order to make reuse effective, reusable programs must be as generic as possible and be specialized for applications. It must be possible for the programmer to specify programs minimally, with inference providing the remainder of the specification.
[Krueger92] and [Mili95] are good surveys of software reuse, including criteria for practical effectiveness. Our work [Novak97] emphasizes transformational program generation. [Batory94] describes his approach using composition of well-structured program layers. [Smith90] describes work at Kestrel Institute on program generation from predicate calculus specifications; this is especially useful for generating combinatoric search programs. [Akers97] describes the SciNapse system for generating mathematical modeling software. [Goguen89] takes a formal approach to parameterized programming by composition of components. [Simonyi96] describes the Intentional Programming work at Microsoft Research, which emphasizes transformations performed on programs represented as abstract syntax trees. [Gries90] proposes program specialization by syntactic transformations on source code.
[Akers97] Akers, Robert, Elaine Kant, Curtis Randall, Stanley Steinberg, and Robert Young, "SciNapse: A Problem-Solving Environment for Partial Differential Equations," IEEE Computational Science and Engineering, July-September 1997, pp. 32-42.
[Batory94] Batory, Don, Jeff Thomas, and M. Sirkin,, "Reengineering a Complex Application Using a Scalable Data Structure Compiler," Proc. ACM SIGSOFT '94, Dec. 1994.
[Biggerstaff94] Biggerstaff, Ted, "The Library Scaling Problem and the Limits of Concrete Component Reuse," Proc. 3rd International Conf. on Software Reuse, Rio de Janeiro, Brazil, Nov. 1994.
[Gamma95] Gamma, Erich, R. Helm, R. Johnson, and J. Vlissides, "Design Patterns: Elements of Reusable Object-Oriented Software," Addison-Wesley, 1995.
[Goguen89] Goguen, J. A., "Principles of Parameterized Programming," in T. Biggerstaff and A. Perlis (eds), Software Reusability, ACM Press / Addison-Wesley, 1989.
[Gries90] Gries, David and D. Volpano, "The Transform -- a New Language Construct," Structured Programming, vol. 11, no. 1, pp. 1-10, 1990.
[Jones93] Jones, Neil, Carsten Gomard, and Peter Sestoft, Partial Evaluation and Automatic Program Generation, Prentice-Hall, 1993.
[Krueger92] Krueger, Charles, "Software Reuse, " ACM Computing Surveys, vol. 24, no. 2 (June 1992), pp 131-184.
[Mili95] Mili, H., F. Mili, and A. Mili, "Reusing Software: Issues and Research Directions," IEEE Trans. Soft. Engr., vol. 21, no. 6, pp. 528-562, June 1995.
[Novak97] Novak, Gordon, "Software Reuse by Specialization of Generic Procedures through Views," IEEE Trans. on Software Engineering, vol. 23, no. 7 (Jul. 1997), pp. 401-417.
[Novak95] Novak, Gordon, "Creation of Views for Reuse of Software with Different Data Representations," IEEE Trans. on Software Engineering, vol. 21, no. 12 (Dec. 1995), pp. 993-1005.
[Novak94a] Novak, Gordon, "Generating Programs from Connections of Physical Models," Proc. Tenth Conference on Artificial Intelligence for Applications (CAIA-94), San Antonio, Texas, March 1994, pp. 224-230.
[Novak94b] Novak, Gordon, "Composing Reusable Software Components through Views, "Proc. 9th Knowledge-Based Software Engineering Conference (KBSE-94), Monterey, CA, Sept. 1994, pp. 39-47.
[Simonyi96] Simonyi, Charles, "Intentional Programming - Innovation in the Legacy Age, Presented at IFIP WG 2.1 meeting, June 4, 1996. See their web page at http://www.research.microsoft.com/ip/default.htm
[Smith90] Smith, Douglas R., "KIDS: A Semiautomatic Program Development System" IEEE Trans. on Software Engineering, vol. 16, no. 9 (Sept. 1990), pp. 1024-1043.
Gordon S. Novak Jr. is Professor of Computer Sciences at the University of Texas at Austin, where he has been since 1978. He also has spent two years at Stanford (1981-83), one year at SRI International (1977-78), ten years programming and managing systems programming in industry at Tracor, Inc. (1966-76), and a summer at Hewlett Packard Labs (1983). He has a BS in EE and MA and PhD in CS from the University of Texas at Austin. He is the author of the Glisp language and compiler and the Isaac system that solves physics problems stated in English. Online demos (using X windows) of systems that generate software by specialization of generic algorithms are available via his web page.