Dissertations and Theses

Here are some (not all) of the dissertations and theses of our research group listed in order of publication date. Publications with students are listed on another page. Each entry below has a citation, an abstract, and a hypertext link to a PDF document. Click here for copyright notice.

To search for multiple keywords separate each keyword by a comma: keyword , keyword

Keywords include:

architectures, analysis, aspects, categories, commuting diagrams, computational design, derivatives, Dense Linear Algebra (DLA),
Design by Transformation (DxT), design rules, Design Rule Checking (DRC), evolution, feature interactions, feature models, geodesic,
higher-order transformations, kubes, MDA, MDD, MDE, mixin layers, optimization, optional feature problem, origami, refactoring,
streaming, safe composition, semantics, testing, UML, verification, wizards, P3 (P1/P2/P3/DiSTiL data structure generator).

See all student publications

Total Publications: 23

J. Kim. Reflective and Relativistic Refactoring with Feature-Awareness. Ph.D. dissertation, 2017, The Department of Computer Sciences, The University of Texas at Austin.

Refactoring is a core technology in modern software development. It is central to popular software design movements, such as Extreme Programming and Agile software development, and all major Integrated Development Environments (IDEs) today offer some form of refactoring support. Despite this, refactoring engines have languished behind research. Modern IDEs offer no means to sequence refactorings to automate program changes. Further, current refactoring engines exhibit problems of speed and expressivity, which makes writing composite refactorings such as design patterns infeasible. Even worse, existing refactoring tools for Object-Oriented languages are unaware of configurations in Software Product Lines (SPLs) codebases. With this motivation in mind, this dissertation makes three contributions to address these issues: First, we present the Java API library, called R2, to script Eclipse refactorings to retrofit design patterns into existing programs. We encoded 18 out of 23 design patterns described by Gang-of-Four as R2 scripts and explain why the remaining refactorings are inappropriate for refactoring engines. R2 sheds light on why refactoring speed and expressiveness are critical issues for scripting. Second, we present a new Java refactoring engine, called R3, that addresses an Achilles heel in contemporary refactoring technology, namely scripting performance. Unlike classical refactoring techniques that modify Abstract Syntax Trees (ASTs), R3 refactors programs by rendering ASTs via pretty printing. AST rendering never changes the AST; it only displays different views of the AST/program. Coupled with new ways to evaluate refactoring preconditions, R3 increases refactoring speed by an order of magnitude over Eclipse and facilitates computing views of a program where the original behavior is preserved. Third, we provide a feature-aware refactoring tool, called X15, for SPL codebases written in Java. X15 takes advantage of R3's view rendering to implement a projection technology in Feature-Oriented Software Development, which produces subprograms of the original SPL by hiding unneeded feature code. X15 is the first feature-aware refactoring tool for Java that implements a theory of refactoring feature modules, and allows users to edit and refactor SPL programs via views. In the most demanding experiments, X15 barely runs a second slower than R3, giving evidence that refactoring engines for SPL codebases can indeed be efficient.

student refactoring design patterns SPLs

R. Goncalves. Parallel Programming by Transformation. Ph.D. dissertation, 2015, The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto, Portugal.

The development of effient software requires the selection of algorithms and optimizations tailored for each target hardware platform. Alternatively, performance portability may be obtained through the use of optimized libraries. However, currently all the invaluable knowledge used to build optimized libraries is lost during the development process, limiting its reuse by other developers when implementing new operations or porting the software to a new hardware platform. To answer these challenges, we propose a model-driven approach and framework to encode and systematize the domain knowledge used by experts when building optimized libraries and program implementations. This knowledge is encoded by relating the domain operations with their implementations, capturing the fundamental equivalences of the domain, and defining how programs can be transformed by refinement (adding more implementation details), optimization (removing ineffiencies), and extension (adding features). These transformations enable the incremental derivation of efficient and correct by construction program implementations from abstract program specifications. Additionally, we designed an interpretations mechanism to associate different kinds of behavior to domain knowledge, allowing developers to animate programs and predict their properties (such as performance costs) during their derivation. We developed a tool to support the proposed framework, ReFlO, which we use to illustrate how knowledge is encoded and used to incrementally|and mechanically|derive effcient parallel program implementations in different application domains. The proposed approach is an important step to make the process of developing optimized software more systematic, and therefore more understandable and reusable. The knowledge systematization is also the first step to enable the automation of the development process.

DxT architectures dataflow categories higher-order transformations MDE student

V. Weber UTFM -- A Next Generation Language and Tool for Feature Modeling . M.Sc. Thesis 2014, Department of Computer Science, University of Twente, Enschede, Netherlands.

We propose the next generation feature modeling language: UTFM (University of Twente/University of Texas Feature Models), that generalizes classical feature models with instance and group cardinalities, feature replication, feature attributes (arithmetic, boolean and string attributes) and complex cross-tree constraints. The language explicitly separates type declarations, defining feature types, hierarchical relations, attributes and constraints, from model configurations. Configurations consist of instance declarations of feature types and value assignments to feature attributes. To validate the proposed language and analysis operations an proof-of-concept tool is implemented. In order to check satisfiability of configurations the tool translates UTFM models to Z3 decision problems (an off-the-shelf SMT theorem prover). To translate the models to the decision problems an UTFM to Z3 mapping is described for the various semantics.

student feature models analysis DRC semantics

B. Marker Design by Transformation: From Domain Knowledge to Optimized Program Generation . Ph.D. dissertation, 2014, The Department of Computer Sciences, The University of Texas at Austin.

Expert design knowledge is essential to develop a library of high-performance software. This includes how to implement and parallelize domain operations, how to optimize implementations, and estimates of which implementation choices are best. An expert repeatedly applies his knowledge, often in a rote and tedious way, to develop all of the related functionality expected from a domain-specific library. Expert knowledge is hard to gain and is easily lost over time when an expert forgets or when a new engineer starts developing code. The domain of dense linear algebra (DLA) is a prime example with software that is so well designed that much of experts' important work has become tediously rote in many ways. In this dissertation, we demonstrate how one can encode design knowledge for DLA so it can be automatically applied to generate code as an expert would or to generate better code. Further, the knowledge is encoded for perpetuity, so it can be reused to make implementing functionality on new hardware easier or it can be used to teach how software is designed to a non-expert. We call this approach to software engineering (encoding expert knowledge and automatically applying it) Design by Transformation (DxT). We present our vision, the methodology, a prototype code generation system, and possibilities when applying DxT to the domain of dense linear algebra.

student DxT

B. Delaware. Feature Modularity in Mechanized Reasoning Ph.D. dissertation, December 2013, The Department of Computer Sciences, The University of Texas at Austin.

Complex systems are naturally understood as combinations of their distinguishing characteristics or features. Distinct features differentiate between variations of configurable systems and also identify the novelties of extensions. The implementation of a conceptual feature is often scattered throughout an artifact, forcing designers to understand the entire artifact in order to reason about the behavior of a single feature. It is particularly challenging to independently develop novel extensions to complex systems as a result. This dissertation shows how to modularly reason about the implementation of conceptual features in both the formalizations of programming languages and object-oriented software product lines. In both domains, modular verification of features can be leveraged to reason about the behavior of artifacts in which they are included: fully mechanized metatheory proofs for programming languages can be synthesized from independently developed proofs, and programs built from well-formed feature modules are guaranteed to be well-formed without needing to be typechecked. Modular reasoning about individual features can furthermore be used to efficiently reason about families of languages and programs which share a common set of features.

Coq source code used.


C.H.P. Kim. Systematic Techniques for Efficiently Checking Software Product Lines Ph.D. dissertation, September 2013, The Department of Computer Sciences, The University of Texas at Austin.

A Software Product Line (SPL) is a family of related programs, which of each is defined by a combination of features. By developing related programs together, an SPL simultaneously reduces programming effort and satisfies multiple sets of requirements. Testing an SPL efficiently is challenging because a property must be checked for all the programs in the SPL, the number of which can be exponential in the number of features. In this dissertation, we present a suite of complementary static and dynamic techniques for efficient testing and runtime monitoring of SPLs, which can be divided into two categories. The first prunes programs, termed configurations, that are irrelevant to the property being tested. More specifically, for a given test, a static analysis identifies features that can influence the test outcome, so that the test needs to be run only on programs that include these features. A dynamic analysis counterpart also eliminates configurations that do not have to be tested, but does so by checking a simpler property and can be faster and more scalable. In addition, for runtime monitoring, a static analysis identifies configurations that can violate a safety property and only these configurations need to be monitored. When no configurations can be pruned, either by design of the test or due to ineffectiveness of program analyses, runtime similarity between configurations, arising due to design similarity between configurations of a product line, is exploited. In particular, shared execution runs all the configurations together, executing bytecode instructions common to the configurations just once. Deferred execution improves on shared execution by allowing multiple memory locations to be treated as a single memory location, which can increase the amount of sharing for object-oriented programs and for programs using arrays. The techniques have been evaluated and the results demonstrate that the techniques can be effective and can advance the idea that despite the feature combinatorics of an SPL structure can be exploited by automated analyses to make tesing more efficient.


M. Azanza. Model Driven Product Line Engineering: Core Asset and Process Implications Ph.D. dissertation, 2011, Department of Computer Languages and Systems, University of the Basque Country, Spain.

Reuse is at the heart of major improvements in productivity and quality in Software Engineering. Both Model Driven Engineering (MDE) and Software Product Line Engineering (SPLE) are software development paradigms that promote reuse. Specifically, they promote systematic reuse and a departure from craftsmanship towards an industrialization of the software development process. MDE and SPLE have established their benefits separately. Their combination, here called Model Driven Product Line Engineering (MDPLE), gathers together the advantages of both. Nevertheless, this blending requires MDE to be recasted in SPLE terms. This has implications on both the core assets and the software development process. The challenges are twofold: (i) models become central core assets from which products are obtained and (ii) the software development process needs to cater for the changes that SPLE and MDE introduce. This dissertation proposes a solution to the first challenge following a feature oriented approach, with an emphasis on reuse and early detection of inconsistencies. The second part is dedicated to assembly processes, a clear example of the complexity MDPLE introduces in software development processes. This work advocates for a new discipline inside the general software development process, i.e., the Assembly Plan Management, which raises the abstraction level and increases reuse in such processes. Different case studies illustrate the presented ideas.


J. Kim. Paan: A Tool for Back-Propagating Changes to Projected Documents . M.Sc. Thesis, May 2011, The Department of Computer Sciences, The University of Texas at Austin.

Research in Software Product Line Engineering (SPLE) traditionally focuses on product derivation. Prior work has explored the automated derivation of products by module composition. However, it has so far neglected propagating changes (edits) in a product back to the product line definition. A domain-specific product should be possible to update its features locally, and later these changes should be propagated back to the product line definition automatically. Otherwise, the entire product line has to be revised manually in order to make the changes permanent. Although this is the current state, it is a very error-prone process. To address these issues, we present a tool called Paan to create product lines of MS Word documents with back-propagation support. It is a diff-based tool that ignores unchanged fragments and reveals fragments that are changed, added or deleted. Paan takes a document with variation points (VPs) as input, and shreds it into building blocks called tiles. Only those tiles that are new or have changed must be updated in the tile repository. In this way, changes in composed documents can be back-propagated to their original feature module definitions. A document is synthesized by retrieving the appropriate tiles and composing them.

Paan web page.

student, analysis, origami, safe composition, verification

C. Kaestner. Virtual Separation of Concerns: Towards Preprocessors 2.0 . Ph.D. Thesis, Department of Technical and Business Information Systems, University of Magdeburg, Germany, May 2010.

Conditional compilation with preprocessors such as cpp is a simple but effective means to implement variability. By annotating code fragments with #ifdef and #endif directives, different program variants with or without these annotated fragments can be created, which can be used (among others) to implement software product lines. Although, such annotation-based approaches are frequently used in practice, researchers often criticize them for their negative effect on code quality and maintainability. In contrast to modularized implementations such as components or aspects, annotation-based implementations typically neglect separation of concerns, can entirely obfuscate the source code, and are prone to introduce subtle errors. Our goal is to rehabilitate annotation-based approaches by showing how tool support can address these problems. With views, we emulate modularity; with a visual representation of annotations, we reduce source code obfuscation and increase program comprehension; and with disciplined annotations and a product-line-aware type system, we prevent or detect syntax and type errors in the entire software product line. At the same time we emphasize unique benefits of annotations, including simplicity, expressiveness, and being language independent. All in all, we provide tool-based separation of concerns without necessarily dividing source code into physically separated modules; we name this approach virtual separation of concerns. We argue that with these improvements over contemporary preprocessors, virtual separation of concerns can compete with modularized implementation mechanisms. Despite our focus on annotation-based approaches, we do intend not give a definite answer on how to implement software product lines. Modular implementations and annotation-based implementations both have their advantages; we even present an integration and migration path between them. Our goal is to rehabilitate preprocessors and show that they are not a lost cause as many researchers think. On the contrary, we argue that -- with the presented improvements -- annotation-based approaches are a serious alternative for product-line implementation.

student, CIDE, safe composition, kaestner

T. Thuem. Reasoning About Feature Model Edits . B.Sc. Thesis, Department of Technical and Business Information Systems, University of Magdeburg, Germany, May 2008.

Feature modeling is a technique in domain analysis to describe the variabilities and commonalities among similar programs. The valid combinations of features are specified in a feature model. Automated analysis of feature models reveals errors and supports the configuration process. Feature models evolve over time and there is a need for edits on feature models. We categorize edits into refactorings, specializations, and generalizations, and arbitrary edits to support the modeling process. This paper presents an algorithm to reason about feature model edits and their categories. Two stand-alone feature models can be taken as input, where the set of features is not necessary identical. We evaluate the algorithm with random feature models to show that the algorithm scales even for larger feature models.

student, feature models, analysis, thuem

S. Trujillo. Feature Oriented Model Driven Product Lines . Ph.D. Dissertation, Department of Computer Sciences, The University of the Basque Country, Spain, March 2007.

Model Driven Development (MDD) is an emerging paradigm for software construction that uses models to specify programs, and model transformations to synthesize executables. Feature Oriented Programming (FOP) is a paradigm for software product lines where programs are synthesized by composing features. Feature Oriented Model Driven Development (FOMDD) is a blend of FOP and MDD that shows how programs in a software product line can be synthesized in an MDD way by composing models from features, and then transforming these models into executables. A case study on a product line of portlets, which are components of web portals, is used to illustrate FOMDD. This case reveals mathematical properties (i.e., commuting diagrams) of portlet synthesis that helped us to validate the correctness of our approach (abstractions, tools and specifications), as well as optimize portlet synthesis. These properties expose the nature of metaprograms for program synthesis (i.e., synthesis is a metaprogram combining primitive operations that form a geometry). We exploit this to synthesize metaprograms, which when executed, will synthesize a target program of a product-line. Specifically, we elaborate on the generation of metaprograms (not programs) from abstract specifications. This is the core of GROVE: the GeneRative metaprOgramming for Variable structurE approach. Variability of synthesis is also considered. Nonetheless, the ultimate envision is a structural theory behind program synthesis.

student, MDD, MDE, MDA, geodesic, higher-order transformations, kubes, origami, categories, commuting diagrams

C. Kaestner. Aspect-Oriented Refactoring of Berkeley DB . M.Sc. Thesis, Department of Technical and Business Information Systems, University of Magdeburg, Germany, March 2007.

Aspect-oriented programming has been repeatedly suggested to implement software product lines (SPL). In this thesis we create a large case study by refactoring the open source project Berkeley DB into an SPL with 38 features using AspectJ. We focus on expressibility, readability, the role of bounded quantification, and feature interactions. Contrary to our initial expectations, the results were not encouraging. We document problems encountered and suggest solutions and directions for further research.

refactoring, student, aspects, kaestner

S. Apel. The Role of Features and Aspects in Software Development . Ph.D. Dissertation, Department of Technical and Business Information Systems, University of Magdeburg, Germany, March 2007.

Feature-Oriented Programming (FOP) and Aspect-Oriented Programming (AOP) are complementary technologies. Though they aim at crosscutting modularity, they do so in different ways. We observed that FOP and AOP can be combined to overcome their individual limitations. Consequently, we propose Aspectual Feature Modules (AFMs), a representative approach that unifies AOP and FOP. From this symbiosis we derive the novel notion of Aspect Refinement (AR) that integrates aspects into the stepwise development philosophy of FOP. We use AFMs and AR in a non-trivial case study to create a product line of overlay networks. We also present a set of guidelines to assist programmers in how and when to use FOP and AOP techniques for implementing product lines in a stepwise and generative manner. Finally, we answer the question of how FOP and AOP-related implementation techniques are used today by analyzing a representative set of AspectJ programs of different sizes. We observe that aspects are used frequently for implementation problems that are closely related to FOP. We discuss why this is not surprising.

student, aspects

S. Thaker. Design and Analysis of Multidimensional Program Structures . M.Sc. Thesis, December 2006, The Department of Computer Sciences, The University of Texas at Austin.

Software design is an art. Just as understanding of the structure of atomic elements is critical to the natural sciences, it too is fundamental to the science behind software engineering. Feature-Oriented Programming and compositional programming embody ideas of software decomposition structure. Within feature-decomposition, multidimensional structures are of particular interest. Multidimensional structures capture a higher-order structural relationship within conventional hierarchal decomposition. They simplify a complex decomposition's representation, understanding, and compositional specification. In this thesis we first overview multidimensional structures. We then identify two critical properties a decomposition must satisfy and devise algorithms for efficiently verifying them. The Safe Composition property guarantees lack of type-errors at composition-time. The Orthogonality property requires a multidimensional decomposition to be identical along all dimensions. In identifying and verifying key properties we have also demonstrated how a structural model can simplify our understanding and applicability of essential principles. When the fundamental structure is well-understood, software design is amenable to automation.

student, analysis, origami, safe composition, verification

M. Grechanik Design and Analysis of Interoperating Components . Ph.D. dissertation, September 2006, The Department of Computer Sciences, The University of Texas at Austin.

Components are modular units (e.g., objects, modules, or programs) that interact by exchanging data. Two or more components interoperate when they exchange information. We propose an abstraction in which foreign objects (i.e., objects that are not defined in a host programming language) are abstracted as graphs and abstract operations access and manipulate them. Using this abstraction, we build a framework called Reification Object-Oriented Framework (ROOF) to reduce the multiplicity of platform API calls to a small set of operations on foreign objects that are common to all platforms. Based on our abstraction, we designed Foreign Object REification Language (FOREL), an extension for object-oriented languages whose type checking coupled with a conservative static analysis mechanism reports potential errors when referencing foreign components. Also, we design and build a tool for legacy code called a Verifier for Interoperating cOmponents for finding Logic fAults (Viola) that finds errors in components exchanging XML data.

student, analysis, verification

R.E. Lopez-Herrejon. Understanding Feature Modularity . Ph.D. dissertation, September 2006, The Department of Computer Sciences, The University of Texas at Austin.

An undesirable consequence is that developers must lower their abstractions from features to those provided by the underlying implementation languages, a process that is far from simple let alone amenable to significant automation. The conceptual gap created between feature abstractions and their modularization hinders program understanding and product line development. The root of the problem is the fact that feature modularity is not well understood and thus not well supported in conventional programming languages, modularization mechanisms, and design techniques. In this dissertation, we explore language and modularity support for features founded on an algebraic model geared for program synthesis. Our model integrates ideas from collaboration-based designs, mixin layers, aspect oriented programming, multidimensional separation of concerns, and generative programming. We assess our model with an implementation of a non-trivial product line case study, and evaluate feature support in emerging modularization technologies.

student, origami, aspects

A. Rauschmayer. Probe: Visualizing Algebraic Transformations in the GenBorg Generative Software Environment, Institut fuer Informatik Ludwig Maximilians Universitaet Muenchen, December, 2001.

Software is becoming increasingly pervasive and complex. The greatest challenge for the industry today is how to produce more code without compromising quality or cost. GenBorg is a programming paradigm whose answer is to scale the units of reuse from components of individual applications to components of application suites (e. g., MS Office). While traditional approaches trade size of code units for flexibility, GenBorg uses techniques from generative programming and feature-oriented programming in order to provide models that instantiate code and non-code artifacts of customizable application suites. Because developing software is not limited to writing code, but current ways of automating development are, a lot of human energy is wasted for keeping related data such as documentation and formal properties consistent with source code. GenBorg manages to evolve all these artifacts automatically and in parallel. GenBorg's ideas are backed by a theoretical foundation, the GenBorg algebra. The Java application Probe is a first implementation of the GenBorg programming paradigm and uses a simple, file-based data structure for defining GenBorg models. Its graphical user interface makes it easy to navigate through and instantiate from large models.


L.A. Tokuda. Evolving Object-Oriented Designs with Refactorings . Ph.D. dissertation, September 1999, The Department of Computer Sciences, The University of Texas at Austin.

Refactorings are behavior-preserving program transformations that automate design evolution in object-oriented applications. Three kinds of design evolution are: schema transformations, the introduction of design pattern micro-architectures, and the hot-spot-driven-approach. This research shows that all three are automatable with refactorings. A comprehensive list of refactorings for design evolution is provided and an analysis of supported schema transformations, design patterns, and hot-spot meta patterns is presented. Further, we evaluate whether refactoring technology can be transferred to the mainstream by restructuring non-trivial C++ applications. The applications that we examine were evolved manually by software engineers. We show that an equivalent evolution could be reproduced significantly faster and cheaper by applying a handful of general-purpose refactorings. In one application, over 14K lines of code were transformed automatically that otherwise would have been coded by hand. Our experiments identify benefits, limitations, and topics of further research related to the transfer of refactoring technology to a production environment.

student refactoring analysis design patterns

Y. Smaragdakis. Implementing Large-Scale Object-Oriented Components . Ph.D. dissertation, September 1999, The Department of Computer Sciences, The University of Texas at Austin.

The complexity of software has driven both researchers and practitioners toward design methodologies that decompose problems into intellectually manageable pieces and that assemble partial products into complete software artifacts. Modularity in design, however, rarely translates into modularity at the implementation level. Hence, an important problem is to provide implementation (i.e., programming language) support for expressing modular designs concisely. This dissertation shows that software can be conveniently modularized using large-scale object-oriented software components. Such large-scale components encapsulate multiple classes but can themselves be viewed as classes, as they support the object-oriented mechanisms of encapsulation and inheritance. This conceptual approach has several concrete applications. First, existing language mechanisms, like a pattern of inheritance, class nesting, and parameterization, can be used to simulate large-scale components called mixin layers. Mixin layers are ideal for implementing certain forms of object-oriented designs and result in simpler and more concise implementations than those possible with previous methodologies. Second, we propose new language constructs to provide better support for component-based programming. These constructs express components cleanly (i.e., without unwanted interactions with other language mechanisms) and address the issue of component type-checking.

mixin layers, student

V.P. Singhal. A Programming Language for Writing Domain-Specific Software System Generators . Ph.D. Dissertation. Department of Computer Sciences, University of Texas at Austin, September 1996.

Automating routine programming tasks is an effective way to increase the productivity of software development. Software system generators have the potential to achieve this goal: customized software systems can be quickly and easily assembled from component libraries. Our research demonstrates that for generators to be successful, component libraries must be scalable. Scalability enables libraries to be small, because the components of the library implement distinct and largely orthogonal features. These components can be combined to yield an enormous family of software systems and subsystems. Generators thus become tools for combining components to manufacture these systems and subsystems. In GenVoca, the programming model that forms the foundation of our research, components act as large-scale refinements which simultaneously transform multiple classes from one abstraction to another. Because GenVoca advocates a novel style of program organization, there is little language or tool support for this paradigm. We have developed a programming language called P++, which extends C++ with specialized constructs to support the GenVoca model. It permits components to be treated as transformations which can simultaneously refine several classes in a consistent manner. To validate the utility of this language, we solved a challenge problem in software reuse: we reimplemented the Booch C++ Components data structures library as a scalable P++ library. We were able to reduce the volume of code and number of components by approximately a factor of four, without compromising the performance of generated systems.


D. Das. Making Database Optimizers More Extensible . Ph.D. Dissertation. Department of Computer Sciences, University of Texas at Austin, May 1995.

Query optimizers are fundamental components of database management systems (DBMSs). An optimizer consists of three features: a search space, a cost model, and a search strategy. The experience of many researchers has shown that hard-wiring these features results in an optimizer that is very inflexible and difficult to modify. Rule-based optimizers have been developed to alleviate some of the problems of monolithic optimizers. Unfortunately, contemporary rule-based optimizers do not provide enough support to enable database implementers (DBI) to fully realize the potential of open systems. We have identified four requirements that a rule-based optimizer should satisfy to address these needs. First, rules should be specified using high-level abstractions, insulating the DBI from underlying implementation details. Second, rule sets should be easily extensible, with a minimum of reprogramming required. Third, rule sets should be easily reconfigurable, that is, changeable to meet a variety of user needs, interfaces, database schemes, etc. Fourth, rule-based optimizers should be fast, that is, performance should not be sacrificed for the sake of high-level specifications. In this dissertation, we describe Prairie, an environment for specifying rules for rule-based optimizers that satisfies all four of the above requirements. The Prairie specification language is presented and we show how it allows a DBI to design an easily extensible rule set for a rule-based optimizer. Experimental results are presented using the Texas Instruments Open ODD optimizer rule set to validate the claim of good performance using Prairie. Finally, a building blocks approach of constructing rule sets is presented; this results in easily reconfigurable rule sets whose features are changeable simply by assembling the blocks in various ways.


E.E. Villarreal. Automated Compiler Generation for Extensible Data Languages . Ph.D. Dissertation. Department of Computer Sciences, University of Texas at Austin, December 1994.

To meet the changing needs of the DBMS community, e.g., to support new database applications such as geographic or temporal databases, new data languages are frequently proposed. Most offer extensions to previously defined languages such as SQL or Quel. Few are ever implemented. The maturity of the area of data languages demands that researchers go beyond the proposal stage to have hands-on experience with their languages, if only to separate the good ideas from the bad. Tools and methodologies for building families of similar languages are definitely needed; we solve this problem by automating the generation of compilers for data languages. Our work, Rosetta, is based on two concepts. First, underlying the domain of data languages is a common backplane of relational operations. Backplane operations are primitive building blocks for language execution and construction, where a building block has standardized semantics. The definition of a well-designed backplane is implementation-independent; that is, the backplane is defined once but can be used to model arbitrarily many data languages. Second, there exist primitive building-blocks for language construction. From our analysis of the database data language domain, we have identified three classes of building-blocks: one class maps language syntax to backplane functions, another builds an internal representation of the backplane operator tree, and a third class manages contextual information. For modeling data languages, we define the Rosetta specification language, a grammar-based specification language tailored to our needs with the power to define syntax, map it to the target language, and build an operator tree all in one rule. Thus each rule is a microcosmic model of a language clause which encapsulates input parsing and code generation. Our specification language models data languages based on the composition of primitive building blocks for semantics and the customization of the syntax for invoking the compositions. A compiler for a data language is generated by first modeling the language and then compiling the specification. The ease and efficiency with which Rosetta customizes languages derives from the reuse of the backplane operations and the high-level specification supported.


M.J. Sirkin. A Software System Generator for Data Structures . Ph.D. Dissertation. Department of Computer Science and Engineering, University of Washington, March 1994.

Although data structures are a fundamental part of most applications, using and writing data structures is time-consuming, difficult, and error-prone. Programmers often select inappropriate data structures for their applications because they do not know which data structure to use, they do not know how to implement a particular data structure, or they do not have an existing implementation of the data structure to use. This dissertation describes a model and a technology for overcoming these problems. Our approach is based on non-traditional parameterized types (NPTs). NPTs are an extension to traditional parameterized types (TPTs), which are already familiar to most programmers. Our NPTs are based on the GenVoca domain modeling concepts, vertical parameterization, a consistent high-level interface, and a transformational compiler. Our research has led to the construction of a software system generator for data structures called Predator. Predator is able to transform data structure declarations and data structure-independent functions into efficient code. Predator also allows programmers to adjust a data structure's implementation by simply changing its declaration and recompiling. This dissertation discusses our model (and how it differs from standard models), our Predator compiler, and the results of our validation efforts.

student P3