Publications

Here are the publications of our research group listed in order of publication date. Each entry has a citation, an abstract, and a hypertext link to a PDF copy of the paper. See the copyright notice.

To search for multiple keywords separate each keyword by a comma: keyword , keyword

Keywords include:

architectures, analysis, aspects, categories, commuting diagrams, computational design, derivatives, Dense Linear Algebra (DLA),
Design by Transformation (DxT), design rules, Design Rule Checking (DRC), evolution, feature interactions, feature models, geodesic,
higher-order transformations, kubes, MDA, MDD, MDE, mixin layers, optimization, optional feature problem, origami, refactoring,
streaming, safe composition, semantics, testing, UML, verification, wizards, P3 (P1/P2/P3/DiSTiL data structure generator).

See all publications

Total Publications: 210

N. Altoyan, D. Batory. On Proving the Correctness of Refactoring Class Diagrams of MDE Metamodels. ACM Transactions on Software Engineering and Methodology (TOSEM), April 2023

Model Driven Engineering (MDE) is general-purpose engineering methodology to elevate system design, maintenance, and analysis to corresponding activities on models. Models (graphical and/or textual) of a target application are automatically transformed into source code, performance models, Promela files (for model checking), and so on for system analysis and construction. Models are instances of metamodels. One form an MDE metamodel can take is a [class diagram, constraints] pair: the class diagram defines all object diagrams that could be metamodel instances; OCL constraints eliminate semantically undesirable instances. A metamodel refactoring is an invertible semantics-preserving co-transformation, i.e., it transforms both a metamodel and its models without losing data. This paper addresses a subproblem of metamodel refactoring: how to prove the correctness of refactorings of class diagrams without OCL constraints using the Coq Proof Assistant.

class diagram refactorings Coq object diagram refactorings verification

D-J. Munoz, J. Oh, M. Pinto, L. Fuentes, D. Batory. Nemo: A Tool to Transform Feature Models with Numerical Features and Arithmetic Constraints. International Conference on Software Reuse (ICSR), June 2022

Real-world Software Product Lines (SPLs) need Numerical Feature Models (NFMs) whose features not only have boolean values satisfying boolean constraints, but also have numeric attributes satisfying arithmetic constraints. A key operation on NFMs finds near-optimal performing products, which requires counting the number of SPL products. Typical constraint satisfaction solvers perform poorly on counting. Nemo (Numbers, features, models) supports NFMs by bit-blasting, the technique that encodes arithmetic as boolean clauses. Nemo translates NFMs to propositional formulas whose products can be counted efficiently by #SAT solvers, enabling near-optimal products to be found. We evaluate Nemo with a diverse set of real-world NFMs, complex arithmetic constraints, and counting experiments in this paper.

award bit-blasting propositional formula numerical features model counting

R. Heradio, D. Fernandez-Amoros, J.A. Galindo, D. Benavides, D. Batory. Uniform and Scalable Sampling of Highly-Configurable Systems. Empirical Software Engineering (Dec 2021), Software Product Lines and Variability-Rich Systems (SPLC 2021)

Many analyses on configurable software systems are intractable when confronted with colossal and highly-constrained configuration spaces. These analyses could instead use statistical inference, where a tractable sample accurately predicts results for the entire space. To do so, the laws of statistical inference requires each member of the population to be equally likely to be included in the sample, i.e., the sampling process needs to be uniform. SAT-samplers have been developed to generate uniform random samples at a reasonable computational cost. However, there is a lack of experimental validation over colossal spaces to show whether the samplers indeed produce uniform samples or not. This paper (i) proposes a new sampler named BDDSampler, (ii) presents a new statistical test to verify sampler uniformity, and (iii) reports the evaluation of BDDSampler and five other state-of-the-art samplers: KUS, QuickSampler, Smarch, Spur, and Unigen2. Our experimental results show only BDDSampler satisfies both scalability and uniformity.

uniform random sampling Configurable systems BDDs SAT solvers

D. Batory, J. Oh, R. Heradio, D. Benavides. Product Optimization in Stepwise Design. Logic, Computation and Rigorous Methods, LNCS 12750.

Stepwise design of programs is a divide-and-conquer strategy to control complexity in program modularization and theorems. It has been studied extensively in the last 30 years and has worked well, although it is not yet commonplace. This paper explores a new area of research, finding efficient products in colossal product spaces, that builds upon past work.

optimization scalable searching uniform random sampling configurable systems configuration spaces machine learning

N. Altoyan. MDE Refactorings: A Categorical Framework with Proofs, Tools, and Implementations. Ph.D. dissertation, August 2020, The Department of Electrical and Computer Engineering, The University of Texas at Austin.

Just as OO refactorings are a staple of OO program development and evolution, so too should metamodel refactorings be a staple of MDE metamodel development and evolution. A refactoring is a bijective semantics-preserving transformation. An MDE refactoring is a pair of corresponding refactorings: one at the metamodel level and the other at the model level. Many insightful papers on implementing MDE refactorings have been written. Our focus is different: proofs of refactoring correctness. Despite most refactorings are relatively simple and intuitively correct, proving their correctness is a major challenge.
In this thesis we formalize MDE refactorings in terms of relational database refactorings, express refactoring correctness in terms of CategoryTheory, and prove theorems using Coq, a modern theorem prover.
Our research is in two parts: refactoring class diagrams while preserving their semantics, and refactoring constraints defined over these diagrams. The first half of this dissertation focuses on the validity of refactoring class diagrams by presenting a framework of formal proofs. The second half uses a central result of the first to study how constraints are affected by class diagram refactorings and how refactorings do not invalidate previously satisfied constraints. Two significant results of the second half are (a) a pure Java constraint language, Aocl, that is quite similar to OCL but without OCL's complexity, and (b) a simple and practical way to evaluate constraints over refactored class diagrams and their object models

student, MDE, refactorings, verification, category theory, categories, Aocl, metamodel refactorings

D. Batory and N. Altoyan. Aocl: A Pure-Java Constraint and Transformation Language for MDE . 8th Int. Conference on Model-Driven Engineering and Software Development (MODELSWARD 2020).

OCL is a standard MDE language to express constraints. OCL has been criticized for being too complicated, over-engineered, and difficult to learn. But beneath OCLs complicated exterior is an elegant language based on relational algebra. We call this language Aocl, which has a straightforward implementation in Java. Aocl can be used to write OCL-like constraints and model transformations in Java. A simple MDE tool generates an Aocl Java 8.0 package from an input class diagram for Aocl to be used.

Enlarged UTCS Technical Report TR-19-04 .
PowerPoint w. Audio Conference Presentation .

OCL, MDE Constraint Language, Aocl, MDE Transformation Language

D. Batory. Should Future Variability Modeling Languages Express Constraints in OCL? . First International Workshop on Languages for Modelling Variability (MODEVAR 2019).

Since the mid-2000s, Propositional Logic (PL) has been the de facto language to express constraints in Feature Models (FMs) of Software Product Line (SPLs). PL was adequate because product configurations were formed by binary decisions including or not including features in a product. Inspired by both prior research and practical systems (eg., SPLs that use KConfig), future FMs must go beyond PL and admit numerical (and maybe even text) variables and their constraints.
The Object Constraint Language (OCL) is a general-purpose declarative constraint language for Model Driven Engineering (MDE), which admits virtually any kind of variable and constraint in metamodels. We should expect future FMs to be examples of MDE metamodels. This raises a basic question: Should OCL be used to express constraints of future variability modeling language(s)? In this talk, I outline the pros and cons for doing so.

OCL Constraint-Language

J. Oh, P. Gazzillo, D. Batory. t-wise Coverage by Uniform Sampling . Systems and Software Product Line Conference (SPLC 19)..

Efficiently testing large configuration spaces of Software Product Lines (SPLs) needs a sampling algorithm that is both scalable and provides good t-wise coverage. The 2019 SPLC Sampling Challenge provides large real-world feature models and asks for a t-wise sampling algorithm that can work for those models. We evaluated t-wise coverage by uniform sampling ( US ) the configurations of one of the provided feature models. US means that every (legal) configuration is equally likely to be selected. US yields statistically representative samples of a configuration space and can be used as a baseline to compare other sampling algorithms. We used existing algorithm called Smarch to uniformly sample SPL configurations. While uniform sampling alone was not enough to produce 100% 1-wise and 2-wise coverage, we used standard probabilistic analysis to explain our experimental results and to conjecture how uniform sampling may enhance the scalability of existing t-wise sampling algorithms.

t-wise-coverage uniform-sampling

D-J. Munoz, J. Oh, M. Pinto, L. Fuentes, D. Batory. Uniform Random Sampling Product Configurations of Feature Models That Have Numerical Features Systems and Software Product Line Conference (SPLC 19).

Analyses of Software Product Lines (SPLs) rely on automated solvers to navigate complex dependencies among features and find legal configurations. Often these analyses do not support numerical features with constraints because propositional formulas use only Boolean variables. Some automated solvers can represent numerical features natively, but are limited in their ability to count and Uniform Random Sample (URS) configurations, which are key operations to derive unbiased statistics on configuration spaces. Bit-blasting is a technique to encode numerical constraints as propositional formulas. We use bit-blasting to encode Boolean and numerical constraints so that we can exploit existing #SAT solvers to count and URS configurations. Compared to state-of-art Satisfiability Modulo Theory and Constraint Programming solvers, our approach has two advantages: 1) faster and more scalable configuration counting and 2) reliable URS of SPL configurations. We also show that our work can be used to extend prior SAT-based SPL analyses to support numerical features and constraints.

feature model bit-blasting propositional formula numerical features model counting SPLs

J. Kim, D. Batory, and M. Gligoric. Code Transformation Issues in Move-Instance-Method Refactorings . 3rd International Workshop on Refactoring, May 2019.

Refactorings, by definition, preserve the behavior of a target program. Such a strong semantic property is encoded by a set of preconditions for each refactoring. Only if all preconditions are satisfied will a target program be transformed. The code transformation that implements the refactoring follows another set of rules to produce syntactically-correct, refactored code. Consequently, it is easy to believe that most behavior changing violations in refactorings are induced by incorrect preconditions or lack of required checks. In this paper, however, we show that code transformations for Move-Instance-Method Refactoring available in several popular Java Integrated Development Environments do not preserve program behavior. We report these errors and propose solutions for each identified problem.

refactoring errors
D. Batory. A Roadmap to Revolutionize SPL Technology by 2025 . Dagstuhl Seminar on Software Evolution in Time and Space, May 2019.

Integrating variability (a.k.a. software product lines (SPLs)) with version control is an exciting and largely unexplored research topic. I encountered this topic in 2007 when visiting a US government facility that created SPLs for a fleet of ships. In following this lead, I encountered yet another technical problem that was similar, but (I thought) more critical to explore -- the integration of variability (SPLs) with refactoring -- and the need modernize (Java or OO) IDE support for building SPLs, so that modern OO program development practices can be applied to SPLs and one-of-a-kind programs.
In this talk, I explain key ideas my colleagues and I discovered in integrating variability with OO refactoring, and how the challenges are very similar to that of integrating variability with version control. I believe it will be useful to be aware of these similarities as research with version control proceeds, because ultimately what modern SPL tooling requires is an integration of variability with version control AND refactorings.

keynote refactoring version control
K. Wang, C. Zhu, A. Celik, J. Kim, D. Batory, and M. Gligoric. Towards Refactoring-Aware Regression Test Selection . International Conference on Software Engineering (ICSE 2018) .

Regression testing checks that recent project changes do not break previously working functionality. Although important, regression testing is costly when changes are frequent. Regression test selection (RTS) optimizes regression testing by running only tests whose results might be affected by a change. Traditionally, RTS collects dependencies (e.g., on fies) for each test and skips the tests, at a new project revision, whose dependencies did not change. Existing RTS techniques do not differentiate behavior-preserving transformations (i.e., refactorings) from other code changes. As a result, tests are run more frequently than necessary.
We present the first step towards a refactoring-aware RTS technique, dubbed Reks, which skips tests affected only by behavior preserving changes. Reks defines rules to update the test dependencies without running the tests. To ensure that Reks does not hide any bug introduced by the refactoring engines, we integrate Reks only in the pre-submit testing phase, which happens on the developers machines. We evaluate Reks by measuring the savings in the testing effort. Specifically, we reproduce 100 refactoring tasks performed by developers of 37 projects on GitHub. Our results show that Reks would not run, on average, 33% of available tests (that would be run by a refactoring-unaware RTS technique). Additionally, we systematically run 27 refactoring types on ten projects. The results, based on 74,160 refactoring tasks, show that Reks would not run, on average, 16% of tests (max: 97% and SD: 24%). Finally, our results show that the Reks update rules are efficient.

refactoring software evolution
P. Angulo-Lopez. Semi-Automatic Program Partitioning . Ph.D. dissertation, 2017, Department Information Technologies and Communications, Technologcio de Monterrey.

Partitioning a program into features is a fundamental problem in both Software Product Line (SPL) and object-oriented framework research. An SPL is a family of related programs that are created and differentiated by features - increments in program functionality. An extractive way to create an SPL is to refactor a legacy program into a stack of feature modules or layers. We developed BttF, a tool that infers the feature membership of all package, class, and member declarations of a Java program from facts given by a program expert. BttF recognizes hooks to be methods that defy classical layering in that they reference declarations of layers/feature modules that are added in the future. The information collected by BttF is used to feature-annotate the program, giving it a feature-based design. Doing so enables optional or unnecessary features to be removed from the program's codebase to produce subprograms, the beginnings of an SPL. Case studies validate BttF, the best result yielded by our experiments show that BttF is capable of inferring the assignment of over 80% of a program's declarations, this means that less than 20% of the assignments were provided as facts by an expert.
An object-oriented framework is a two-tiered structure: a framework (consisting of many abstract classes) that encode common and shared behavior in a domain of applications, and a plug-in (having a concrete class for each framework abstract class) to complete an application. Refactoring a legacy application into a framework and plug-in is a common activity. Surprisingly, tools to perform this task are absent, even though frameworks have been in use for 25 years. After all this time, it remains a tedious, error-prone, and manual task. Two tools are needed: one that determines how classes of a legacy application and their members are partitioned across a framework and plug-in, and which methods are hooks. A second tool takes this partitioning information as input and runs a refactoring script that invokes refactorings programmatically to transform the legacy application into a framework and a plug-in. R3 is a new refactoring engine for Java that supports the writing of transformation scripts. We present the integration of BttF and R3 to cover this need, and evaluate this integration by effectively transforming a set of six Java programs into a framework and plug-in structure.
Furthermore, we use BttF as a tool to identify features in a non-object-oriented environment: Multi-Agent Systems (MAS). This active research area has started to see some efforts to integrate SPL concepts and technologies, however, identifying features in existing MAS was a problem not being addressed. BttF was successfully used to identify features on six MAS simulators developed on GAMA platform.

student refactoring frameworks SPLs
J. Oh, D. Batory, M. Myers, and N. Siegmund. Finding Near-Optimal Configurations in Product Lines by Random Sampling . ACM Foundations Of Software Engineering (FSE) . September 2017, Paderborn, Germany.

Software Product Lines (SPLs) are highly configurable systems. This raises the challenge to find near-optimal performing configurations for an anticipated workload. As SPL configuration spaces are huge, it is infeasible to benchmark all configurations to find an optimal one. Prior work focused on building performance models to predict and optimize SPL configurations; we randomly sample and recursively search a configuration space to find good configurations without constructing a prediction model. Our algorithms are simpler and have higher accuracy and efficiency.

Presentation Slides.

random sampling, configuration spaces.
D. Batory. Test of Time Award and Conjectures on the Future of SPLs. Software Product Line Conference (SPLC) . September 2017, Seville, Spain.

This talk has two parts: First, I briefly summarize and explain the paper 'Feature Models, Grammars, and Propositional Formulas' and acknowledge many researchers who have contributed to this and related results over the last decade. Second, I convey my insights on how scientific knowledge progresses and matures, the bumps in the road that we are all experiencing now as a consequence, and the existential challenges ahead for many sub-disciplines in Software Engineering, including Software Product Lines.

PDF of Presentation Slides.

award keynote feature models product lines
J. Kim, D. Batory, D. Dig. Refactoring Java Software Product Lines . Software Product Line Conference (SPLC) . September 2017, Seville, Spain.

Refactoring is a staple of Object-Oriented (OO) program development. It should be a staple of OO Software Product Line (SPL) development too. X15 integrates state-of-the-art Java refactoring and SPL technologies, and is the first tool to support the refactoring of Java SPL codebases. X15 (1) uses Java custom annotations to encode variability in feature-based Java SPLs, (2) projects a view of a SPL product (a program that corresponds to a legal SPL configuration), and (3) allows programmers to edit and refactor the product, propagating changes back to the SPL codebase. Case studies apply 2316 refactorings in 8 public Java SPLs and show that X15 is as efficient, expressive, and scalable as a state-of-the-art feature-unaware Java refactoring engine.

Data from paper.

refactoring SPL design patterns X15 x15
J. Kim, D. Batory, D. Dig. X15: A Tool For Refactoring Java Software Product Lines . Software Product Line Conference (SPLC) Tool Demonstrations . September 2017, Seville, Spain.

X15 is the first tool that can apply common object-oriented refactorings to Java Software Product Lines (SPLs). X15 is also the first tool that programmers can write custom scripts (to call refactorings programmatically) to retrofit design patterns into Java SPLs. We motivate and illustrate X15’s unique capabilities in this paper.

refactoring SPL design patterns X15 x15
J. Kim. Reflective and Relativistic Refactoring with Feature-Awareness Ph.D. dissertation, 2017, The Department of Computer Sciences, The University of Texas at Austin.

Refactoring is a core technology in modern software development. It is central to popular software design movements, such as Extreme Programming and Agile software development, and all major Integrated Development Environments (IDEs) today offer some form of refactoring support. Despite this, refactoring engines have languished behind research. Modern IDEs offer no means to sequence refactorings to automate program changes. Further, current refactoring engines exhibit problems of speed and expressivity, which makes writing composite refactorings such as design patterns infeasible. Even worse, existing refactoring tools for Object-Oriented languages are unaware of configurations in Software Product Lines (SPLs) codebases. With this motivation in mind, this dissertation makes three contributions to address these issues:
First, we present the Java API library, called R2, to script Eclipse refactorings to retrofit design patterns into existing programs. We encoded 18 out of 23 design patterns described by Gang-of-Four as R2 scripts and explain why the remaining refactorings are inappropriate for refactoring engines. R2 sheds light on why refactoring speed and expressiveness are critical issues for scripting.
Second, we present a new Java refactoring engine, called R3, that addresses an Achilles heel in contemporary refactoring technology, namely scripting performance. Unlike classical refactoring techniques that modify Abstract Syntax Trees (ASTs), R3 refactors programs by rendering ASTs via pretty printing. AST rendering never changes the AST; it only displays different views of the AST/program. Coupled with new ways to evaluate refactoring preconditions, R3 increases refactoring speed by an order of magnitude over Eclipse and facilitates computing views of a program where the original behavior is preserved.
Third, we provide a feature-aware refactoring tool, called X15, for SPL codebases written in Java. X15 takes advantage of R3's view rendering to implement a projection technology in Feature-Oriented Software Development, which produces subprograms of the original SPL by hiding unneeded feature code. X15 is the first feature-aware refactoring tool for Java that implements a theory of refactoring feature modules, and allows users to edit and refactor SPL programs via views. In the most demanding experiments, X15 barely runs a second slower than R3, giving evidence that refactoring engines for SPL codebases can indeed be efficient.

student refactoring design patterns SPLs
D. Batory. Teaching Modeling and Variability in Software Design and its Importance to Science . Keynote at Workshop on Modelling in Software Engineering (MiSE), Austin, Texas, May 2016 and at Workshop on Formal Methods in Software Engineering (FormaliSE), Austin, Texas, May 2016.

The goal of Automated Software Design (ASD) is to automate tasks of program development and to identify the "laws" or algebraic identities -- general or domain specific -- that software engineers use implicitly in designs but may not realize it. I trace this idea from material that I present in my classes to fundamental results and recent advances in ASD research. PPTX of presentation.

DxT, keynote, laws, relational query optimization, UML, refactorings
D. Dig, R.Johnson, D. Marinov, B. Bailey, and D. Batory. COPE: Vision for a Change-Oriented Programming Environment . Visions Track, International Conference on Software Engineering (ICSE) , Austin, Texas, May 2016.

Software engineering involves a lot of change as code artifacts are not only created once but maintained over time. In the last 25 years, major paradigms of program development have arisen -- agile development with refactorings, software product lines, moving sequential code to multicore or cloud, etc. Each is centered on particular kinds of change; their conceptual foundations rely on transformations that (semi-) automate these changes. We are exploring how transformations can be placed at the center of software development in future IDEs, and when such a view can provide benefits over the traditional view. Cope, a Change-Oriented Programming Environment, looks at 5 activities: (1) analyze what changes programmers typically make and how they perceive, recall, and communicate changes, (2) automate transformations to make it easier to apply and script changes, (3) develop tools that compose and manipulate transformations to make it easier to reuse them, (4) integrate transformations with version control to provide better ways for archiving and understanding changes, and (5) develop tools that infer higher-level transformations from lower-level changes. Characterizing software development in terms of transformations is an essential step to take software engineering from manual development to (semi-) automated development of software.

IDE
J. Kim, D. Batory, D. Dig and M. Azanza. Improving Refactoring Speed by 10x . International Conference on Software Engineering (ICSE) . Austin, Texas, May 2016.

Refactoring engines are standard tools in today's Integrated Development Environments (IDEs). They allow programmers to perform one refactoring at a time, but programmers need more. Most design patterns in the Gang-of-Four text can be written as a refactoring script - a programmatic sequence of refactorings. In this paper, we present R3, a new Java refactoring engine that supports refactoring scripts. It builds a main-memory, non-persistent database to encode Java entity declarations (e.g., packages, classes, methods), their containment relationships, and language features such as inheritance and modifiers. Unlike classical refactoring engines that modify Abstract Syntax Trees (ASTs), R3 refactorings modify only the database and refactored code is produced only when pretty-printing ASTs that reference database changes. R3 performs comparable precondition checks to those of the Eclipse Java Development Tools (JDT) but R3's codebase is about half the size of the JDT refactoring engine and runs an order of magnitude faster. Further, a user study shows that R3 improved the success rate of retrotting design patterns by 25% up to 50%.

refactoring design patterns
R. Goncalves, D. Batory, J. Sobral, and T. Riche. From Software Extensions to Product Lines of Dataflow Programs . Software and Systems Modeling (SoSym), 2015 .

Dataflow programs are widely used. Each program is a directed graph where nodes are computations and edges indicate the flow of data. In prior work, we reverse-engineered legacy dataflow programs by deriving their optimized implementations from a simple specification graph using graph transformations called refinements and optimizations. In MDE-speak, our derivations were PIM-to-PSM mappings. In this paper, we show how extensions complement refinements, optimizations, and PIM-to-PSM derivations to make the process of reverse engineering complex legacy dataflow programs tractable. We explain how optional functionality in transformations can be encoded, thereby enabling us to encode product lines of transformations as well as product lines of dataflow programs.We describe the implementation of extensions in the ReFlO tool and present two non-trivial case studies as evidence of our work’s generality. Keywords: MDE, PIM, PSM, model transformation, software extension, dataflow programs, software product line.

MDE, MDA, MDD, DxT, refinements, optimizations, extensions
D. Batory and M. Azanza. Teaching Model Driven Engineering from a Relational Database Perspective . Software and Systems Modeling (SoSym), 2015 .

We reinterpret MDE from the viewpoint of relational databases to provide an alternative way to understand, demonstrate, and teach MDE using concepts and technologies that should be familiar to undergraduates. We use (1) relational database schemas to express metamodels, (2) relational databases to express models, (3) Prolog to express constraints and M2M transformations, (4) Java tools to implement M2T and T2M transformations, and (5) Java to execute transformations. Application case studies and a user study illuminate the viability and benefits of our approach.

MDE, MDA, MDD, MDELite, award
D. Batory, B. Marker, and R. van de Geijn. Opportunities in Computational Science to Advance Software Engineering . Computational Science and Engineering Software Sustainability and Productivity Challenges (CSESSP), Oct. 2015 .

We describe the disparity between current-day Software Engineering (SE) research and that which is needed to advance Computational Science and Engineering (CSE) research. We explain how overcome the SE/CSE gap and explain our progress to date.

DSLs, DxT, DLA, MDE, MDA, MDD
S. Souto, D. Gopinath, M. d'Amorim, D. Marinov, S. Khurshid, and D. Batory. Faster Bug Detection for Software Product Lines with Incomplete Feature Models . Software Product Line Conference (SPLC) . July 2015, Nashville, Tennessee.

A software product line (SPL) is a family of programs that are differentiated by features -- increments in functionality. Systematically testing an SPL is challenging because it requires running each test of a test suite against a combinatorial number of programs. Feature models capture dependencies among features and can (1) reduce the space of programs to test and (2) enable accurate categorization of failing tests as failures of programs or the tests themselves, not as failures due to illegal combinations of features. In practice, sadly, feature models are not always available. We introduce SPLif, the first approach for testing SPLs that does not require the a priori availability of feature models. Our insight is to use a profile of passing and failing test runs to quickly identify failures that are indicative of real problems in test or code rather than specious failures due to illegal feature combinations. Experimental results on five SPLs and one large configurable system (GCC) demonstrate the effectiveness of our approach. SPLif enabled the discovery of five news bugs in GCC, three of which have already been fixed.

testing, analysis.
J. Kim, D. Batory, and D. Dig. Scripting Parametric Refactorings in Java to Retrofit Design Patterns . International Conference on Software Maintenance and Evolution (ICSME) . Bremen, Germany, October 2015.

Retrofitting design patterns into a program by hand is tedious and error-prone. A programmer must distinguish refactorings that are provided by an IDE from those that must be realized manually, determine a precise sequence of these refactorings to apply, and perform this sequence repetitively to a laborious degree. We designed, implemented, and evaluated Reflective Refactoring (R2), a Java package to automate the creation of classical design patterns (Visitor, Abstract Factory, etc.), their inverses, and variants. We encoded 18 out of 23 Gang-of-Four design patterns as R2 scripts and explain why the remaining are inappropriate for refactoring engines. We evaluate the productivity and scalability of R2 with a case study of 6 real-world applications. In one case, R2 automatically created a Visitor with 276 visit methods by invoking 554 Eclipse refactorings in 10 minutes -- an achievement that could not be done manually. R2 also sheds light on why refactoring correctness, expressiveness, and speed are critical issues for scripting in next-generation refactoring engines.

refactoring design patterns
R. Goncalves. Parallel Programming by Transformation. Ph.D. dissertation, 2015, The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto, Portugal.

The development of effient software requires the selection of algorithms and optimizations tailored for each target hardware platform. Alternatively, performance portability may be obtained through the use of optimized libraries. However, currently all the invaluable knowledge used to build optimized libraries is lost during the development process, limiting its reuse by other developers when implementing new operations or porting the software to a new hardware platform.
To answer these challenges, we propose a model-driven approach and framework to encode and systematize the domain knowledge used by experts when building optimized libraries and program implementations. This knowledge is encoded by relating the domain operations with their implementations, capturing the fundamental equivalences of the domain, and defining how programs can be transformed by refinement (adding more implementation details), optimization (removing ineffiencies), and extension (adding features). These transformations enable the incremental derivation of efficient and correct by construction program implementations from abstract program specifications. Additionally, we designed an interpretations mechanism to associate different kinds of behavior to domain knowledge, allowing developers to animate programs and predict their properties (such as performance costs) during their derivation. We developed a tool to support the proposed framework, ReFlO, which we use to illustrate how knowledge is encoded and used to incrementally|and mechanically|derive effcient parallel program implementations in different application domains. The proposed approach is an important step to make the process of developing optimized software more systematic, and therefore more understandable and reusable. The knowledge systematization is also the first step to enable the automation of the development process.

DxT architectures dataflow categories higher-order transformations MDE student
D.E. Perry and D. Batory. A Theory about the Structure of GTSEs . General Theory of Software Engineering (GTSE), Florence, Italy, May 2015.

Managing complexity is a fundamental goal of software engineering. One of the core techniques that has been successful in practice is that which separate concerns, especially variants of architectural abstractions called components and connectors. We present and illustrate a theory about the general structure of General Theories of Software Engineering (GTSE) - namely, that they should be organized as components and connectors to distinguish conceptually distinct general theory elements and their inter-relationships and interdependencies. Doing so, we argue, separates concerns that should be distinct and not conflated, thereby increasing the value of GTSE efforts.

GTSE Structure, Theory Architecture, Component Theories, Connector Theories
D. Batory. A Theory of Modularity for Automated Software Design . Keynote at International Conference on Modularity (MODULARITY), Fort Collins, Colorado, March 2015

Automated Software Development (ASD) are technologies for developing customized programs automatically and compositionally from modules. The foundations of ASD are domain-specific algebras, where each program in the target domain maps to a unique expression. Algebraic identities are used to optimize programs automatically. In this keynote, I trace the history of ASD and present a general theory of modularity for ASD that follows from its tenets.

PPTX of presentation.

aspects, categories, commuting diagrams, feature models, geodesic, MDE, mixin layers, optimization, UML, relational query optimization
D. Batory, P. Hoefner, D. Koeppl, B. Moeller, and A. Zelend. Structured Document Algebra in Action . Software, Services and Systems (LNCS Volume 8950) 2015

A Structured Document Algebra (SDA) defines modules with variation points and how such modules compose. The basic operations are module addition and replacement. Repeated addition can create nested module structures. SDA also allows the decomposition of modules into smaller parts. In this paper we show how SDA modules can be used to deal algebraically with Software Product Lines (SPLs). In particular, we treat some fundamental concepts of SPLs, such as refinement and refactoring. This leads to mathematically precise formalisation of fundamental concepts used in SPLs, which can be used for improved Feature-Oriented Software Development (FOSD) tooling.

V. Weber UTFM -- A Next Generation Language and Tool for Feature Modeling . M.Sc. Thesis 2014, Department of Computer Science, University of Twente, Enschede, Netherlands.

We propose the next generation feature modeling language: UTFM (University of Twente/University of Texas Feature Models), that generalizes classical feature models with instance and group cardinalities, feature replication, feature attributes (arithmetic, boolean and string attributes) and complex cross-tree constraints. The language explicitly separates type declarations, defining feature types, hierarchical relations, attributes and constraints, from model configurations. Configurations consist of instance declarations of feature types and value assignments to feature attributes.
To validate the proposed language and analysis operations an proof-of-concept tool is implemented. In order to check satisfiability of configurations the tool translates UTFM models to Z3 decision problems (an off-the-shelf SMT theorem prover). To translate the models to the decision problems an UTFM to Z3 mapping is described for the various semantics.

student feature models analysis DRC semantics
B. Marker, D. Batory, F. Van Zee, R. van de Geijn. Making Scientific Computing Libraries Forward Compatible . Workshop on Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE 2014)

NSF's Software Infrastructure for Sustained Innovation funds the development of community software in support of scientific computing innovation. A requirement is that the developed software be sustainable. Design-by-Transformation (DxT) is an approach to software development that views libraries not as instantiated in code, but as expert knowledge that is combined with knowledge about a target architecture by a tool (DxTer) that synthesizes the library implementation. We argue that this approach makes libraries to some degree forward compatible in that a (disruptive) new architectural advance can be accommodated by encoding knowledge about that architecture. This is particularly important when bugs are not correctness bugs, but instead performance bugs that affect how fast an answer is obtained and/or how much energy is consumed to compute the answer. DxT allows a human expert to focus on developing the primitives from which libraries are constructed and new insights as opposed to the rote application of known ideas to entire libraries. We summarize our success in the domain of dense linear algebra as evidence of DxT’s potential.

B. Marker, D. Batory, R. van de Geijn. Understanding Performance Stairs: Elucidating Heuristics . Automated Software Engineering (ASE 2014)

How do experts navigate the huge space of implementations for a given specification to find an efficient choice with minimal searching? Answer: They use heuristics -- rules of thumb that are more street wisdom than scientific fact. We provide a scientific justification for Dense Linear Algebra (DLA) heuristics by showing that only a few decisions (out of many possible) are critical to performance; once these decisions are made, the die is cast and only relatively minor performance improvements are possible. The (implementation x performance) space of DLA is stair-stepped. Each stair is a set of implementations with very similar performance and (surprisingly) share key design decision(s). High-performance stairs align with heuristics that prescribe certain decisions in a particular context. Stairs also tell us how to tailor the search engine of a DLA code generator to reduce the time it needs to find implementations that are as good or better than those crafted by experts.

DxT DLA optimization
B. Marker. Design by Transformation: From Domain Knowledge to Optimized Program Generation . Ph.D. dissertation, 2014, The Department of Computer Sciences, The University of Texas at Austin.

Expert design knowledge is essential to develop a library of high-performance software. This includes how to implement and parallelize domain operations, how to optimize implementations, and estimates of which implementation choices are best. An expert repeatedly applies his knowledge, often in a rote and tedious way, to develop all of the related functionality expected from a domain-specific library. Expert knowledge is hard to gain and is easily lost over time when an expert forgets or when a new engineer starts developing code. The domain of dense linear algebra (DLA) is a prime example with software that is so well designed that much of experts' important work has become tediously rote in many ways. In this dissertation, we demonstrate how one can encode design knowledge for DLA so it can be automatically applied to generate code as an expert would or to generate better code. Further, the knowledge is encoded for perpetuity, so it can be reused to make implementing functionality on new hardware easier or it can be used to teach how software is designed to a non-expert. We call this approach to software engineering (encoding expert knowledge and automatically applying it) Design by Transformation (DxT). We present our vision, the methodology, a prototype code generation system, and possibilities when applying DxT to the domain of dense linear algebra.

student DxT
R.C. Goncalves, D. Batory, J.L. Sobral ReFlO: An Interactive Tool for Pipe-And-Filter Domain Specification and Program Generation . Software and Systems Modeling, 2014.

ReFlO is a framework and interactive tool to record and systematize domain knowledge used by experts to derive complex pipe-and-filter (PnF) applications. Domain knowledge is encoded as transformations that alter PnF graphs by refinement (adding more details), flattening (removing modular boundaries), and optimization (substituting inefficient PnF graphs with more efficient ones). All three kinds of transformations arise in reverse-engineering legacy PnF applications. We present the conceptual foundation and tool capabilities of ReFlO, illustrate how parallel PnF applications are designed and generated, and how domain-specific libraries of transformations are developed.

DxT, refinements, optimizations, Perry Substitition Principle, PSP
B. Delaware. Feature Modularity in Mechanized Reasoning . Ph.D. dissertation, December 2013, The Department of Computer Sciences, The University of Texas at Austin.

Complex systems are naturally understood as combinations of their distinguishing characteristics or features. Distinct features differentiate between variations of configurable systems and also identify the novelties of extensions. The implementation of a conceptual feature is often scattered throughout an artifact, forcing designers to understand the entire artifact in order to reason about the behavior of a single feature. It is particularly challenging to independently develop novel extensions to complex systems as a result. This dissertation shows how to modularly reason about the implementation of conceptual features in both the formalizations of programming languages and object-oriented software product lines. In both domains, modular verification of features can be leveraged to reason about the behavior of artifacts in which they are included: fully mechanized metatheory proofs for programming languages can be synthesized from independently developed proofs, and programs built from well-formed feature modules are guaranteed to be well-formed without needing to be typechecked. Modular reasoning about individual features can furthermore be used to efficiently reason about families of languages and programs which share a common set of features.

Coq source code used.

student
B. Marker, R. van de Geijn, and D. Batory Interfaces are Key . 1st Int. Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering

Many dense linear algebra (DLA) operations are easy to understand at a high level and users get functional DLA code on new hardware relatively quickly. As a result, many people consider DLA to be a 'solved domain'. The truth is that DLA is not solved. DLA experts are rare because the 'tricks' and variety of algorithms they need to get high performance take time to learn. DLA implementations are only available on a new architecture when an expert with enough experience goes through a rote process to implement many related DLA operations. While so much of the manual work is rote, this hardly suggests the domain is 'solved'. We have not proven that we understand the field until we have automated the expert. Automate the expert for the entire field, and the field is closed. We view that goal as the equivalent of going to Mars. In practice, we will get to the moon automatically, and experts will then be freed up to worry about how to get from there to Mars.

DxT DLA
C.H.P. Kim. Systematic Techniques for Efficiently Checking Software Product Lines . Ph.D. dissertation, September 2013, The Department of Computer Sciences, The University of Texas at Austin.

A Software Product Line (SPL) is a family of related programs, which of each is defined by a combination of features. By developing related programs together, an SPL simultaneously reduces programming effort and satisfies multiple sets of requirements. Testing an SPL efficiently is challenging because a property must be checked for all the programs in the SPL, the number of which can be exponential in the number of features. In this dissertation, we present a suite of complementary static and dynamic techniques for efficient testing and runtime monitoring of SPLs, which can be divided into two categories. The first prunes programs, termed configurations, that are irrelevant to the property being tested. More specifically, for a given test, a static analysis identifies features that can influence the test outcome, so that the test needs to be run only on programs that include these features. A dynamic analysis counterpart also eliminates configurations that do not have to be tested, but does so by checking a simpler property and can be faster and more scalable. In addition, for runtime monitoring, a static analysis identifies configurations that can violate a safety property and only these configurations need to be monitored. When no configurations can be pruned, either by design of the test or due to ineffectiveness of program analyses, runtime similarity between configurations, arising due to design similarity between configurations of a product line, is exploited. In particular, shared execution runs all the configurations together, executing bytecode instructions common to the configurations just once. Deferred execution improves on shared execution by allowing multiple memory locations to be treated as a single memory location, which can increase the amount of sharing for object-oriented programs and for programs using arrays. The techniques have been evaluated and the results demonstrate that the techniques can be effective and can advance the idea that despite the feature combinatorics of an SPL structure can be exploited by automated analyses to make tesing more efficient.

student
N. Siegmund, S. Kolesnikov, C. Kaestner, S. Apel, D. Batory, M. Rosenmueller, and G. Saake. Performance Prediction in the Presence of Feature Interactions (Extended Abstract) . Software Engineering (SE) 2014 .

J. Kim, D. Batory, D. Dig. Can Undergraduates Script Their Own Refactorings? . Workshop on Refactoring Tools (WRT) 2013

We present a status report on a project to build a refactoring engine whose primary goal is to allow undergraduates to write classical and neo-classical refactorings (pull-up, class partitioning) and design patterns (visitor, framework) as parameterized refactoring scripts in Java. We explain the first step of our work that creates a reflection-like interface to expose the structure of an Eclipse JDT application as Java objects; methods of these objects are refactorings. Doing so hides the complexity of JDT refactoring code and tools, so that refactoring scripts can be written as compact Java methods. We present preliminary performance results of scripting JDT refactorings and sketch the next steps of our work.

refactorings, reflective, Eclipse
D. Batory, P. Hoefner, B. Moeller, A. Zelend. Features, Modularity, and Variation Points . Workshop on Feature-Oriented Software Development (FOSD) 2013

A feature interaction algebra (FIA) is an abstract model of features, feature interactions, and their compositions. A structured document algebra (SDA) defines modules with variation points and how such modules compose. We present both FIA and SDA in this paper, and homomorphisms that relate FIA expressions to SDA expressions. This leads to mathematically precise formalizations of fundamental concepts used in software product lines, which can be used for improved FOSD tooling and teaching material.

feature interactions, optional feature problem
D. Batory, R. Goncalves, B. Marker, J. Siegmund. Dark Knowledge and Graph Grammars in Automated Software Design . Keynote at Conference on Software Language Engineering (SLE) 2013 .

Mechanizing the development of hard-to-write and costly-to-maintain software is the core problem of automated software design. Encoding expert knowledge (a.k.a. dark knowledge) about a software domain is central to its solution. We assert that a solution can be cast in terms of the ideas of language design and engineering. Graph grammars can be a foundation for modern automated software development. The sentences of a grammar are designs of complex dataflow systems. We explain how graph grammars provide a framework to encode expert knowledge, produce correct-by-construction derivations of dataflow applications, enable the generation of high-performance code, and improve how software design of dataflow applications can be taught to undergraduates.

keynote, DxT, MDE, MDA, MDD
D. Batory, E. Latimer, and M. Azanza. Teaching Model Driven Engineering from a Relational Database Perspective . Model Driven Engineering Languages and Systems (MODELS) 2013 .

We reinterpret MDE from the viewpoint of relational databases to provide an alternative way to teach, understand, and demonstrate MDE using concepts and technologies that should be familiar to undergraduates. We use (1) relational databases to express models and metamodels, (2) Prolog to express constraints and M2M transformations, (3) Java tools to implement M2T and T2M transformations, and (4) OO shell-scripting languages to compose MDE transformations. Case studies demonstrate the viability of our approach.

MDE, MDA, MDD, MDELite
C.H.P. Kim, D. Marinov, S. Khurshid and D. Batory. SPLat: Lightweight Dynamic Analysis for Reducing Combinatorics in Testing Configurable Systems . European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) 2013 .

Many software systems can be configured through dynamic and/or static selection of configuration parameters. For example, software product lines specify a family of programs where a unique combination of features defines a program. Systematically testing such systems is expensive because it can require running each test against a combinatorial number of configurations. Fortunately, a test is often independent of many configuration parameters and need not be run against every combination. Therefore, configurations that are not required for a test can be pruned from its execution. This paper presents SPLat, a novel technique that dynamically prunes such configurations. Our key insight is that the configurations that must be run for a test can be determined during test execution by monitoring accesses to configuration parameters. SPLat achieves an optimal reduction in the number of configurations and provides a lightweight, more widely applicable technique compared to previous approaches based on static analysis and heavyweight dynamic execution. Experimental results on seven product lines written in Java show that SPLat substantially reduces the total test execution time in most cases. Moreover, we apply SPLat on a large industrial code written in Ruby on Rails.

testing, analysis.
B. Marker, D. Batory, and R. van de Geijn. A Case Study in Mechanically Deriving Dense Linear Algebra Code . International Journal of High Performance Computing Applications .

Design by Transformation (DxT) is a top-down approach to mechanically derive high-performance algorithms for dense linear algebra. We use DxT to derive the implementation of a representative matrix operation, two- sided Trmm. We start with a knowledge base of transformations that were encoded for a simpler set of operations, the level-3 BLAS, and add only a few transformations to accommodate the more complex two-sided Trmm. These additions explode the search space of our prototype system, DxTer, requiring the novel techniques defined in this paper to eliminate large segments of the search space that contain suboptimal algorithms. Performance results for the mechanically optimized implementations on 8,192 cores of a BlueGene/P architecture are given.

DxT, DLA, MDE, MDA, MDD
D. Batory. The Challenges and Science of Variability . Keynote at Dagstuhl Seminar on Analysis, Test and Verification in The Presence of Variability, February 2013

Variability is everywhere: in our social structures, in the universe, and in our software. In this talk, I review some of the key challenges that variability brings to software development, and how theories of automated software design will hinge on mathematical theories of variability.

Presentation Slides.

keynote, homomorphisms, testing
B. Marker, D. Batory, R. van de Geijn. DSLs, DLA, DxT, and MDE in CSE . International Workshop on Software Engineering for Computational Science and Engineering (SECSE), May 2013

We narrate insights from a collaboration between researchers in Software Engineering (SE) and in the domain of Dense Linear Algebra (DLA) libraries. We highlight our impressions of how software development for computational science has traditionally been different from the development of software in other domains. We observe that scientific software (at least DLA libraries) is often developed by domain experts rather than legions of programmers. For this reason, researchers in SE need to impact the productivity of experts rather than the productivity of the masses. We document this and other lessons learned.

DSLs, DxT, DLA, MDE, MDA, MDD
D. Batory. Why (Meta-)Theories of Automated Software Design Are Essential: A Personal Perspective . SEMAT Workshop on a General Theory of Software Engineering (GTSE), May 2013

Program generators are tools that automatically construct customized programs in a particular domain. Generators mechanize implicit theories of how a domain expert would go about writing an efficient program. Abstracting the core activities of a domain expert and automating them is analogous to creating and evaluating theories in physics and other natural sciences. Theories have a revered place in natural sciences; eventually theories will assume a comparable place in automated software design. The reason is simple economics: generators will remove the burden of difficult or mundane tasks from an engineer to a machine

B. Marker, D. Batory, R. van de Geijn. Code Generation and Optimization of Distributed-Memory Dense Linear Algebra Kernels . International Workshop on Automatic Performance Tuning (IWAPT), June 2013

Design by Transformation (DxT) is an approach to software development that encodes domain-specific programs as graphs and expert design knowledge as graph transformations. The goal of DxT is to mechanize the generation of highly-optimized code. This paper demonstrates how DxT can be used to transform sequential specifications of an important set of Dense Linear Algebra (DLA) kernels, the level-3 Basic Linear Algebra Subprograms (BLAS3), into high-performing library routines targeting distributed-memory (cluster) architectures. Getting good BLAS3 performance for such platforms requires deep domain knowledge, so their implementations are manually coded by experts. Unfortunately, there are few such experts and developing the full variety of BLAS3 implementations takes a lot of repetitive effort. A prototype tool, DxTer, automates this tedious task. We explain how we build on previous work to represent loops and multiple loop-based algorithms in DxTer. Performance results on a BlueGene/P parallel supercomputer show that the generated code meets or beats implementations that are hand-coded by a human expert and outperforms the widely used ScaLAPACK library.

DxT, DLA, MDE, MDA, MDD, optimization, streaming
C.H.P. Kim, S. Khurshid, D. Batory. Shared Execution for Efficiently Testing Product Lines . International Symposium on Software Reliability Engineering (ISSRE 2012), November 2012 .

A software product line (SPL) is a family of related programs, each of which is uniquely defined by a combination of features. Testing an SPL requires running each of its programs, which may be computationally expensive as the number of programs in an SPL is potentially exponential in the number of features. It is also wasteful since instructions common to many programs must be repeatedly executed, rather than just once. To reduce this waste, we propose the idea of shared execution, which runs instructions just once for a set of programs until a variable read yields multiple values, causing execution to branch for each value until a common execution point that allows shared execution to resume. Experiments show that shared execution can be faster than conventionally running each program from start to finish, despite its overhead.

testing, analysis.
T. Riche, R. Goncalves, B. Marker, and D. Batory. Pushouts in Software Architecture Design . Generative Programming and Component Engineering (GPCE), September 2012

A classical approach to program derivation is to progressively extend a simple specification and then incrementally refine it to an implementation. We claim this approach is hard or impractical when reverse engineering legacy software architectures. We present a case study that shows optimizations and pushouts -- in addition to refinements and extensions -- are essential for practical stepwise development of complex software architectures.

architectures, DxT, MDE, MDA, MDD, riche
B. Marker, J. Poulson, D. Batory, and R. van de Geign. Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer . International Workshop on Automatic Performance Tuning (IWAPT), July 2012

To implement dense linear algebra algorithms for distributed-memory computers, an expert applies knowledge of the domain, the target architecture, and how to parallelize common operations. This is often a rote process that becomes tedious for a large collection of algorithms. We have developed a way to encode this expert knowledge such that it can be applied by a system to generate mechanically the same (and sometimes better) highly-optimized code that an expert creates by hand. This paper illustrates how we have encoded a subset of this knowledge and how our system applies it and searches a space of generated implementations automatically.

architectures, DxT, MDE, MDA, MDD.
J. Feigenspan, D. Batory, and T. Riche. Is the Derivation of a Model Easier to Understand than the Model Itself? . International Conference on Program Comprehension (ICPC), June 2012

Software architectures can be presented by graphs with components as nodes and connectors as edges. These graphs, or models, typically encode expert domain knowledge, which makes them difficult to understand. Hence, instead of presenting a complete complex model, we can derive it from a simple, easy-to-understand model by a set of easy-to-understand transformations. In two controlled experiments, we evaluate whether a derivation of a model is easier to understand than the model itself.

architectures, DxT, MDE, MDA, MDD, riche
N. Siegmund, S.S. Kolesnikov, C. Kaestner, S. Apel, D. Batory, M. Rosenmueller, G. Saake. Predicting Performance via Automated Feature-Interaction Detection . International Conference on Software Engineering (ICSE), June 2012

Customizable programs and program families provide user-selectable features to tailor a program to an application scenario. Knowing in advance which feature selection yields the best performance is difficult because a direct measurement of all possible feature combinations is infeasible. Our work aims at predicting program performance based on selected features. The challenge is predicting performance accurately when features interact. An interaction occurs when a feature combination has an unexpected influence on performance. We present a method that automatically detects performance feature interactions to improve prediction accuracy. To this end, we propose three heuristics to reduce the number of measurements required to detect interactions. Our evaluation consists of six real-world case studies from varying domains (e.g. databases, compression libraries, and web server) using different configuration techniques (e.g., configuration files and preprocessor flags). Results show, on average, a prediction accuracy of 95%.

kaestner, feature interactions, performance prediction, rosenmueller
D. Batory. P2: A Twenty+ Year Retrospective . Keynote at Dagstuhl Seminar on Software Synthesis , April 2012

There is a resurgence in work on container data structures. I review work on the P2 series of container data structure generators, and present lessons that I've learned and that others may find of interest. The key ability of generators is optimization: finding feasible solutions to a programming problem are easy in many domains. The hard part is finding efficient solutions.
Presentation Slides.

keynote, P2, optimizations
B. Marker, A. Terrel, J. Poulson, R. van de Geijn, and D. Batory. Mechanizing the Expert Dense Linear Algebra Developer (Poster Paper) . Principles and Practice of Parallel Programming (PPoPP), February 2012.

The efforts of an expert to parallelize and optimize a dense linear algebra algorithm for distributed-memory targets are largely mechanical and repetitive. We demonstrate that these efforts can be encoded and automatically applied to obviate the manual implementation of many algorithms in high-performance code.

DxT, MDE, DLA, dense linear algebra
D. Batory, P. Hoefner, and J. Kim. Feature Interactions, Products, and Composition . Generative Programming and Component Engineering (GPCE), October 2011.

The relationship between feature modules and feature interactions is not well-understood. To explain classic examples of feature interactions, we show that features are not only composed sequentially by the (dot) operation, but also by the cross-product (x) and interaction (#) operations that heretofore were implicit in the literature. We present a formal model of these operations, show how it connects and explains previously unrelated results in Feature Oriented Soft- ware Development (FOSD), and describe a tool, based on our formalism, that demonstrates how changes in composed documents can be back-propagated to their original feature module definitions, thereby improving FOSD tooling.

feature interactions, optional feature problem
B. Delaware, W. Cook, and D. Batory. Product Lines of Theorems . SPLASH/OOPSLA Conference , October 2011

Mechanized proof assistants are powerful verification tools, but proof development can be difficult and time-consuming. When verifying a family of related programs, the effort can be reduced by proof reuse. In this paper, we show how to engineer product lines with theorems and proofs built from feature modules. Each module contains proof fragments which are composed together to build a complete proof of correctness for each product. We consider a product line of programming languages, where each variant includes metatheory proofs verifying the correctness of its semantic definitions. This approach has been realized in the Coq proof assistant, with the proofs of each feature independently certifiable by Coq. These proofs are composed for each language variant, with Coq mechanically verifying that the composite proofs are correct. As validation, we formalize a core calculus for Java in Coq which can be extended with any combination of casts, interfaces, or generics.

oopsla, semantics, verification, feature interaction
M. Azanza. Model Driven Product Line Engineering: Core Asset and Process Implications . Ph.D. dissertation, 2011, Department of Computer Languages and Systems, University of the Basque Country, Spain.

Reuse is at the heart of major improvements in productivity and quality in Software Engineering. Both Model Driven Engineering (MDE) and Software Product Line Engineering (SPLE) are software development paradigms that promote reuse. Specifically, they promote systematic reuse and a departure from craftsmanship towards an industrialization of the software development process. MDE and SPLE have established their benefits separately. Their combination, here called Model Driven Product Line Engineering (MDPLE), gathers together the advantages of both. Nevertheless, this blending requires MDE to be recasted in SPLE terms. This has implications on both the core assets and the software development process. The challenges are twofold: (i) models become central core assets from which products are obtained and (ii) the software development process needs to cater for the changes that SPLE and MDE introduce. This dissertation proposes a solution to the first challenge following a feature oriented approach, with an emphasis on reuse and early detection of inconsistencies. The second part is dedicated to assembly processes, a clear example of the complexity MDPLE introduces in software development processes. This work advocates for a new discipline inside the general software development process, i.e., the Assembly Plan Management, which raises the abstraction level and increases reuse in such processes. Different case studies illustrate the presented ideas.

student
D. Batory and B. Delaware. Towards Verification of Product Lines . Keynote at Interactive Theorem Proving (ITP), August 2011

Although mechanized proof assistants are powerful verification tools, proof development can still be difficult and time-consuming. It becomes even more challenging when proofs are needed for product lines. A product line is a family of similar programs. Each program is constructed by a distinct (linear) combination of features, where a feature or feature module encapsulates program fragments that are added to a program to introduce a new capability or functionality to that program.
The first part of my presentation reviews basic concepts on product lines and how programs are synthesized using feature modules. The second part addresses product line verification. I explain how proofs for product lines can be engineered using feature modules. Each module contains proof fragments which are composed to build a complete proof of correctness for each product. A product line of programming languages is considered, where each variant includes metatheory proofs verifying the correctness of its syntax and semantic definitions. The approach has been realized in the Coq proof assistant, where the proofs of each feature are independently certifiable by Coq. Proofs are composed for each language variant, where Coq mechanically verifies that the composite proofs are correct. As validation, a core calculus for Java in Coq was formalized which can be extended with any combination of casts, interfaces, or generics.
This is joint work with Benjamin Delaware and William Cook.

Presentation Slides.

keynote, semantics, safe composition, verification, feature interaction
B. Marker, D. Batory, J. Poulson, and R. van de Geijn. How a Domain-Specific Language Enables the Automation of Optimized Code for Dense Linear Algebra (Extended Abstract) . International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), May 2011

DxT, MDE, MDA, MDE, streaming architectures, DLA
J. Kim. Paan: A Tool for Back-Propagating Changes to Projected Documents . M.Sc. Thesis, May 2011, The Department of Computer Sciences, The University of Texas at Austin.

Research in Software Product Line Engineering (SPLE) traditionally focuses on product derivation. Prior work has explored the automated derivation of products by module composition. However, it has so far neglected propagating changes (edits) in a product back to the product line definition. A domain-specific product should be possible to update its features locally, and later these changes should be propagated back to the product line definition automatically. Otherwise, the entire product line has to be revised manually in order to make the changes permanent. Although this is the current state, it is a very error-prone process. To address these issues, we present a tool called Paan to create product lines of MS Word documents with back-propagation support. It is a diff-based tool that ignores unchanged fragments and reveals fragments that are changed, added or deleted. Paan takes a document with variation points (VPs) as input, and shreds it into building blocks called tiles. Only those tiles that are new or have changed must be updated in the tile repository. In this way, changes in composed documents can be back-propagated to their original feature module definitions. A document is synthesized by retrieving the appropriate tiles and composing them.

Paan web page.

student analysis origami safe composition verification
D. Dig and D. Batory, The 4th Workshop on Refactoring Tools (WRT'11) . March 2011.

Refactoring is the process of applying behavior-preserving transformations to a program with the objective of improving the program's design. A specific refactoring is identified by a name (e.g., Extract Method), a set of preconditions, and a set of transformations that need to be performed. Tool support for refactoring is essential because checking the preconditions of refactoring often requires nontrivial program analysis, and applying transformations may affect many locations throughout a program. In recent years, the emergence of light-weight programming methodologies such as Extreme Programming has generated a great amount of interest in refactoring, and refactoring support has become a required feature in today's IDEs. This workshop is a continuation of a series of previous workshops (ECOOP 2007, OOPSLA 2008 and 2009 -- see http://refactoring.info/WRT) where researchers and developers of refactoring tools can meet, discuss recent ideas and work, and view tool demonstrations.

refactoring
C. Kaestner. Virtual Separation of Concerns: Towards Preprocessors 2.0 . Ph.D. Thesis, Department of Technical and Business Information Systems, University of Magdeburg, Germany, May 2010.

Conditional compilation with preprocessors such as cpp is a simple but effective means to implement variability. By annotating code fragments with #ifdef and #endif directives, different program variants with or without these annotated fragments can be created, which can be used (among others) to implement software product lines. Although, such annotation-based approaches are frequently used in practice, researchers often criticize them for their negative effect on code quality and maintainability. In contrast to modularized implementations such as components or aspects, annotation-based implementations typically neglect separation of concerns, can entirely obfuscate the source code, and are prone to introduce subtle errors. Our goal is to rehabilitate annotation-based approaches by showing how tool support can address these problems. With views, we emulate modularity; with a visual representation of annotations, we reduce source code obfuscation and increase program comprehension; and with disciplined annotations and a product-line-aware type system, we prevent or detect syntax and type errors in the entire software product line. At the same time we emphasize unique benefits of annotations, including simplicity, expressiveness, and being language independent. All in all, we provide tool-based separation of concerns without necessarily dividing source code into physically separated modules; we name this approach virtual separation of concerns. We argue that with these improvements over contemporary preprocessors, virtual separation of concerns can compete with modularized implementation mechanisms. Despite our focus on annotation-based approaches, we do intend not give a definite answer on how to implement software product lines. Modular implementations and annotation-based implementations both have their advantages; we even present an integration and migration path between them. Our goal is to rehabilitate preprocessors and show that they are not a lost cause as many researchers think. On the contrary, we argue that -- with the presented improvements -- annotation-based approaches are a serious alternative for product-line implementation.

student CIDE safe composition kaestner
S. Apel, D. Batory, K. Czarnecki, F. Heidenreich, C. Kaestner, and O. Nierstrasz. Proceedings of the 2nd International Workshop on Feature-Oriented Software Development (FOSD), October 2010. .
kaestner
C.H.P. Kim, D. Batory, and S. Khurshid. Reducing Combinatorics in Testing Product Lines . Aspect Oriented Software Development (AOSD), March 2011.

A Software Product Line (SPL) is a family of programs where each program is defined by a unique combination of features. Testing or checking properties of an SPL is hard as it may require the examination of a combinatorial number of programs. In reality, however, features are often irrelevant for a given test | they augment, but do not change, existing behavior, making many feature combinations unnecessary as far as testing is concerned. In this paper we show how to reduce the amount of effort in testing an SPL. We represent an SPL in a form where conventional static program analysis techniques can be applied to find irrelevant features for a test. We use this information to reduce the combinatorial number of SPL programs to examine.

testing, analysis, feature models, analysis, feature models, testing
D. Batory FOSD - A Science of Software Design . Keynote at Dagstuhl Seminar on Feature Oriented Software Development, January 2011

Presentation Slides.

keynote, categories, commuting diagrams, feature interactions, MDA, MDE, MDD, mixin layers
D. Batory. Thoughts on Automated Software Design and Synthesis . Future of Software Engineering Research Workshop (FoSER), November 2010.

I summarize some personal observations on the topic of automated software design and synthesis that I accumulated over twenty years. They are intended to alert researchers to pitfalls that may be encountered and to identify goals for future efforts in advancing software engineering education and research.

computational design, refactoring, composition.
C.H.P. Kim, E. Bodden, D. Batory, and S. Khurshid. Enforcing Safety Properties in Product Lines . Runtime Verification (RV), November 2010.

A product line is a family of programs where each program is defined by a unique combination of features. Product lines, like conventional programs, can be checked for safety properties through execution monitoring. However, applying execution monitoring techniques for conventional programs against product lines is expensive because one would have to monitor all of the product line's programs, the number of which can be exponential in the number of features. Also, doing so would be wasteful because many programs can provably never violate a stated property. We introduce a monitoring technique dedicated to product lines that given a safety property, determines the statements that need to be instrumented for the corresponding execution monitor to trigger and the feature combinations required for these statements to be reached. Thus, we identify feature combinations that cannot possibly violate the stated safety property, reducing the number of programs to monitor. Experiments show that our technique is effective, particularly for safety properties that crosscut many optional features.

testing, analysis, feature models
C.H.P. Kim, D. Batory, and S. Khurshid. Eliminating Products to Test in a Software Product Line (Short Paper) . Automated Software Engineering (ASE), September 2010.

A Software Product Line (SPL) is a family of programs. Testing an SPL is a challenge because the number of programs to examine may be exponential in the number of features. However, there are features whose absence or presence has no bearing on the outcome of a test. We can ignore such irrelevant features and consider combinations of only the remaining features, thereby eliminating unnecessary test runs. In this paper, we propose a product line representation that enables a conventional static program analysis to be applied. We then present a classification of features that can be used to narrow down the search for relevant features. Conditions of relevance that a static analysis can check are outlined and a procedure that uses the set of relevant features to reduce the combinatorial number of programs to test is sketched.

analysis, feature models, testing
T.L. Riche, H.M. Vin, and D. Batory. Transformation-Based Parallelization of Request-Processing Applications . Model Driven Engineering Languages and Systems (MODELS), October 2010.

Multicore, multithreaded processors are rapidly becoming the platform of choice for high-throughput request-processing applications (RPAs). We refer to this class of modern parallel platforms as multi-* systems. In this paper, we describe the design and implementation of Lagniappe, a translator that simplifies RPA development by transforming portable models of RPAs to highthroughput multi-* executables. We demonstrate Lagniappe’s effectiveness with a pair of stateful RPA case studies.

MDD riche MDE computational design award streaming architectures DxT
M. Azanza, D. Batory, O. Diaz, and S. Trujillo. Domain-Specific Composition of Model Deltas . Int. Conference on Model Transformations (ICMT) , LNCS 6142, July 2010.

We present a general approach to the incremental development of model-based applications using endogenous transformations, i.e. transformations whose input and output models conform to the same metamodel. Our work describes these transformations as model deltas that, when composed, deliver a complete model. We establish a relationship between a metamodel and its corresponding delta metamodel, show how model deltas can be defined as model changes (additions), explain how deltas can be composed using domain-specific composition algorithms, and propose metamodel annotations to specify these algorithms. We present different case studies as proofs of concept.

MDD MDE categories computational design
E. Uzuncaova, S. Khurshid, and D. Batory. Incremental Test Generation for Software Product Lines . IEEE Transactions on Software Engineering (IEEE TSE), Feb. 2010.

Recent advances in mechanical techniques for systematic testing have increased our ability to automatically find subtle bugs, and hence to deploy more dependable software. This paper builds on one such systematic technique, scope-bounded testing, to develop a novel specification-based approach for efficiently generating tests for products in a software product line. Given properties of features as firstorder logic formulas in Alloy, our approach uses SAT-based analysis to automatically generate test inputs for each product in a product line. To ensure soundness of generation, we introduce an automatic technique for mapping a formula that specifies a feature into a transformation that defines incremental refinement of test suites. Our experimental results using different data structure product lines show that an incremental approach can provide an order of magnitude speed-up over conventional techniques. We also present a further optimization using dedicated integer constraint solvers for feature properties that introduce integer constraints, and show how to use a combination of solvers in tandem for solving Alloy formulas.

MDD MDE analysis commuting diagrams geodesic testing verification
D. Batory. On the Importance and Challenges of FOSD. Keynote at First Workshop on Feature Oriented Software Development (FOSD), Denver, Colorado, October 2009

Among the key elements of mature engineering is automated production: we understand the technical problems and we understand their solutions; our goal is to automate production as much as possible to increase product quality, reduce costs and time-to-market, and be adept at creating new products quickly and cheaply. Automated production is a technological statement of maturity: 'We have built these products so often by hand that we have gotten it down to a Science'. Models of automated production are indeed the beginnings of a Science of Automated Design (SOAD). Feature Oriented Software Development (FOSD) will play a fundamental role in SOAD, and I believe also play a fundamental role in the future of software engineering. In this presentation, I explain what distinguishes FOSD from other software design disciplines and enumerate key technical barriers that lie ahead for FOSD and SOAD.

PDF of presentation.
Survey and responses for this keynote.

keynote, evolution, feature interactions, featue models, kubes, origami, refactoring, semantics, testing, verification
D. Batory (work with T. Riche). Refinement and Optimization of Streaming Architectures. Keynote at Software Engineering and Data Engineering (SEDE), Las Vegas, June 2009 and the Software Engineering and Databases (JISDB), San Sebastian, Spain, September 2009

We present a Model Driven Engineering approach to explain, verify, build, and test dataflow or streaming software architectures that are to be parallelized for performance or availability. Componentconnector models are incrementally elaborated by transformations that refine or optimize architectural designs. We re-engineered two significant case studies to illustrate the generality of our work: (1) recoverable crash fault tolerant servers and (2) join parallelizations in database machines.

PDF of presentation.

DxT, keynote, pipe-and-filter, architectural refinement, optimization, riche

G. Freeman, D. Batory, G. Lavender, and J.N. Sarvela. Lifting Transformational Models of Product Lines: A Case Study.. Software and Systems Modeling. October 2009. This is a journal version of our 2008 ICMT paper

Model driven development (MDD) of software product lines (SPLs) merges two increasing important paradigms that synthesize programs by transformation. MDD creates programs by transforming models, and SPLs elaborate programs by applying transformations called features. In this paper, we present the design and implementation of a transformational model of a product line of scalar vector graphics and JavaScript applications. We explain how we simplified our implementation by lifting selected features and their compositions from our original product line (whose implementations were complex) to features and their compositions of another product line (whose specifications were simple). We used operators to map higher-level features and their compositions to their lower-level counterparts. Doing so exposed commuting relationships among transformations (features) in both product lines that helped validate our model and implementation.

higher-order transformations, MDD, MDE, MDA, award

M. Kuhlemann, D. Batory, and S. Apel. Safe Composition of Non-Monotonic Features . Generative Programming and Component Engineering (GPCE), October 2009.

Programs can be composed from features. We want to verify automatically that all legal combinations of features can be composed safely without errors. Prior work on this problem assumed that features add code monotonically. We generalize prior work to enable features to both add and remove code, describe our analyses and implementation, and review case studies. We observe that more expressive features can increase the complexity of developed programs rapidly up to the point where automated concepts as presented in this paper are not a helpful tool but a necessity for verification.

refactoring, safe composition

M. Kuhlemann, D. Batory, and S. Apel. Refactoring Feature Modules . International Conference on Software Reuse (ICSR), September 2009.

In feature-oriented programming, a feature is an increment in program functionality and is implemented by a feature module. Programs are generated by composing feature modules. A generated program may be used by other client programs but occasionally must be transformed to match a particular legacy interface before it can be used. We call the mismatch of the interface of a generated program and a client-desired interface an incompatibility. We introduce the notion of refactoring feature modules (RFMs) that extend feature modules with refactorings. We explain how RFMs reduce incompatibilities and facilitate reuse. We report our experiences on five case studies.

refactoring

B. Delaware, W. Cook, and D. Batory. Fitting the Pieces Together: A Machine-Checked Model of Safe Composition . ACM SIGSOFT FSE, August 2009.

Programs of a software product line can be synthesized by composing features which implement a unit of program functionality. In most product lines, only some combination of features are meaningful; feature models express the highlevel domain constraints that govern feature compatibility. Product line developers also face the problem of safe composition --whether every product allowed by a feature model is type-safe when compiled and run. To study the problem of safe composition, we present Lightweight Feature Java (LFJ), an extension of Lightweight Java with support for features. We define a constraint-based type system for LFJ and prove its soundness using a full formalization of LFJ in Coq. In LFJ, soundness means that any composition of features that satisfies the typing constraints will generate a well-formed LJ program. If the constraints of a feature model imply these typing constraints then all programs allowed by the feature model are type-safe.

feature models, mixin layers, semantics, safe composition

C. Kaestner, S. Apel, S.S. ur Rahman, M. Rosenmueller, and D. Batory. On the Impact of the Optional Feature Problem: Analysis and Case Studies. Software Product Line Conference (SPLC), September 2009.

A software product-line is a family of related programs that are distinguished in terms of features. A feature implements a stakeholders’ requirement. Different program variants specified by distinct feature selections are produced from a common code base. The optional feature problem describes a common mismatch between variability intended in the domain and dependencies in the implementation. When this occurs, some variants that are valid in the domain cannot be produced due to implementation issues. There are many different solutions to the optional feature problem, but they all suffer from drawbacks such as reduced variability, increased development effort, reduced efficiency, or reduced source code quality. In this paper, we examine the impact of the optional feature problem in two case studies in the domain of embedded database systems, and we survey different state-ofthe- art solutions and their trade-offs. Our intension is to raise awareness of the problem, to guide developers in selecting an appropriate solution for their product-line project, and to identify opportunities for future research.

kaestner, rosenmueller, derivatives, optional feature problem

C. Kaestner, S. Apel, S. Trujillo, M. Kuhlemann, and D. Batory. Guaranteeing Syntactic Correctness for All Product Line Variants: A Language-Independent Approach . International Conference Objects, Models, Components, Patterns (TOOLS), June 2009.

A software product line (SPL) is a family of related program variants in a well-defined domain, generated from a set of features. A fundamental difference from classical application development is that engineers develop not a single program but a whole family with hundreds to millions of variants. This makes it infeasible to separately check every distinct variant for errors. Still engineers want guarantees on the entire SPL. A further challenge is that an SPL may contain artifacts in different languages (code, documentation, models, etc.) that should be checked. In this paper, we present CIDE, an SPL development tool that guarantees syntactic correctness for all variants of an SPL. We show how CIDE’s underlying mechanism abstracts from textual representation and we generalize it to arbitrary languages. Furthermore, we automate the generation of safe plug-ins for additional languages from annotated grammars. To demonstrate the language-independent capabilities, we applied CIDE to a series of case studies with artifacts written in Java, C++, C, Haskell, ANTLR, HTML, and XML.

kaestner, feature interactions, verification

T. Thuem. Reasoning About Feature Model Edits . B.Sc. Thesis, Department of Technical and Business Information Systems, University of Magdeburg, Germany, May 2008.

Feature modeling is a technique in domain analysis to describe the variabilities and commonalities among similar programs. The valid combinations of features are specified in a feature model. Automated analysis of feature models reveals errors and supports the configuration process. Feature models evolve over time and there is a need for edits on feature models. We categorize edits into refactorings, specializations, and generalizations, and arbitrary edits to support the modeling process. This paper presents an algorithm to reason about feature model edits and their categories. Two stand-alone feature models can be taken as input, where the set of features is not necessary identical. We evaluate the algorithm with random feature models to show that the algorithm scales even for larger feature models.

student, feature models, analysis, thuem

B. Delaware, W. Cook, and D. Batory. A Machine-Checked Model of Safe Composition. Foundations of Aspect-Oriented Languages (FOAL), March 2009.

Programs of a software product line can be synthesized by composing features which implement some unit of program functionality. In most product lines, only some combination of features are meaningful; feature models express the high-level domain constraints that govern feature compatibility. Product line developers also face the problem of safe composition -- whether every product allowed by a feature model is type-safe when compiled and run To study the problem of safe composition, we present Lightweight Feature Java (LFJ), an extension of Lightweight Java with support for features. We define a constraint-based type system for LFJ and prove its soundness using a full formalization of LFJ in Coq. In LFJ, soundness means that any composition of features that satisfies the typing constraints will generate a well-formed LJ program. If the constraints of a feature model imply these typing constraints then all programs allowed by the feature model are type-safe.

PDF of Presentation.

constraints, safe composition, feature models, verification, semantics

T. Thuem, D. Batory, and C. Kaestner. Reasoning about Edits to Feature Models. International Conference on Software Engineering (ICSE), May 2009.

Features express the variabilities and commonalities among programs in a software product line (SPL). A feature model defines the valid combinations of features, where each combination corresponds to a program in an SPL. SPLs and their feature models evolve over time. We classify the evolution of a feature model via modifications as refactorings, specializations, generalizations, or arbitrary edits. We present an algorithm to reason about feature model edits to help designers to determine how the program membership of an SPL has changed. Our algorithm takes two feature models as input (before and after edit versions), where the set of features in both models are not necessarily the same, and it determines the change classification. Our algorithm efficiently classifies edits to models that have thousands of features.

thuem, feature models, recommended design rule checking, kaestner
D. Batory. Program Kubes: Dimensions of Variability in Software Synthesis. Keynote at Variability Modeling in Software-Intensive Systems (VAMOS), January 2009, and the Software Product Line Evolution Workshop, Pune, India, February 2009.

It is common in software product lines to have orthogonal and interacting sets of features. In such cases, the design of a program can expressed as a matrix of program transformations, where both rows and columns denote features. The technique is called Origami, as the matrix is folded in precise ways (thereby composing transformations) until a scalar is produced. This scalar is an expression (a composition of transformations) that synthesizes the program when it is evaluated. Origami generalizes to n-dimensional arrays, where each axis defines a dimension of variability. Previously our understanding of Origami was intuitive; here we present a mathematical explanation of Origami that exposes its principles and integrates diverse topics: database technology (data cubes), mathematics (ideas from tensors and categories), programming languages (the expression problem), and software design (feature interactions).

PDF of presentation.

keynote, origami, categories, commuting diagrams, computational design, feature interactions, feature models, tensors
D. Batory, M. Azanza, and J. Saraiva. The Objects and Arrows of Computational Design. Keynote at Model Driven Engineering Languages and Systems (MODELS), October 2008.

Computational Design (CD) is a paradigm where both program design and program synthesis are computations. CD merges Model Driven Engineering (MDE) which synthesizes programs by transforming models, with Software Product Lines (SPL) where programs are synthesized by composing transformations called features. In this paper, basic relationships between MDE and SPL are explored using the language of modern mathematics.

PDF of Presentation.
Interesting blogs on MODELS keynote: blog1, blog2.

MDE, MDD, MDA, categories, keynote, recommended theory

E. Uzuncaova, D. Garcia, S. Khurshid, and D. Batory. Testing Software Product Lines Using Incremental Test Generation. Int. Symposium on Software Reliability Engineering (ISSRE), November 2008.

We present a novel specification-based approach for generating tests for products in a software product line. Given properties of features as first-order logic formulas, our approach uses SAT-based anaylsis to automatically generate test inputs for each product in a product line. To ensure soundness of generation, we introduce an automatic technique for mapping a formula that specifies a feature into a transformation that defines incremental refinement of test suites. Our experimental results using different data structure product lines show that our approach can provide an order of magnitude speed-up over conventional techniques.

testing, commuting diagrams, geodesic

D. Batory. A Modeling Language for Program Design and Synthesis. Advances in Software Engineering, E. Boerger and A. Cisternino (editors), LNCS 5316. Keynote at Brazilian Symposium on Software Engineering, October 2007.

Software engineers define structures called programs and use tools to manipulate, transform, and analyze them. A modeling language is needed to express program design and synthesis as a computation, and elementary algebra fits the bill. I review recent results in automated software design, testing, and maintenance and use the language of elementary mathematics to explain and relate them. Doing so outlines a general and simple way to express and understand the relationships between different topics in program synthesis.

boerger, Computational Design, commuting diagrams, keynote, geodesic

D. Batory. Using Modern Mathematics as an FOSD Modeling Langauge. Generative Programming and Component Engineering (GPCE), October 2008.

Modeling languages are a fundamental part of automated software development. MDD, for example, uses UML class diagrams and state machines as languages to define applications. In this paper, we explore how Feature Oriented Software Development (FOSD) uses modern mathematics as a modeling language to express the design and synthesis of programs in software product lines, but demands little mathematical sophistication from its users. Doing so has three practical benefits: (1) it offers a simple and principled mathematical description of how FOSD transforms, derives, and relates program artifacts, (2) it exposes previously unrecognized commuting relationships among tool chains, thereby providing new ways to debug tools, and (3) it reveals new ways to optimize software synthesis.

categories, Computational Design, commuting diagrams, geodesic

C.H.P. Kim, C. Kaestner, and D. Batory. On the Modularity of Feature Interactions. Generative Programming and Component Engineering (GPCE), October 2008.

Feature modules are the building blocks of programs in software product lines (SPLs). A foundational assumption of feature-based program synthesis is that features are composed in a predefined order. Recent work on virtual separation of concerns reveals a new model of feature interactions that shows that feature modules can be quantized as compositions of smaller modules called derivatives. We present this model and examine some of its unintuitive consequences, namely, that (1) a given program can be reconstructed by composing features in any order, and (2) the contents of a feature module (as expressed as a composition of derivatives) is determined automatically by a feature order. We show that different orders allow one to adjust the contents of a feature module to isolate and study the impact of interactions that a feature has with other features. Using derivatives, we show the utility of generalizing safe composition (SC), a basic analysis of SPLs that verifies program type-safety, to prove that every legal composition of derivatives (and thus any composition order of features) produces a type-safe program, which is a much stronger SC property.

Click here for code downloads..

kaestner, derivatives, feature interactions, safe composition, commuting diagrams, optional feature problem

S. Apel, C. Kaestner, and D. Batory. Program Refactoring using Functional Aspects. Generative Programming and Component Engineering (GPCE), October 2008.

A functional aspect is an aspect that has the semantics of a transformation; it is a function that maps a program to an advised program. Functional aspects are composed by function composition. In this paper, we explore functional aspects in the context of aspect-oriented refactoring. We show that refactoring legacy applications using functional aspects is just as flexible as traditional aspects in that (a) the order in which aspects are refactored does not matter, and (b) the number of potential aspect interactions is decreased. We analyze several aspectoriented programs of different sizes to support our claims.

refactoring, aspects, commuting diagrams, kaestner

A. Hemakumar. Finding Contradictions in Feature Models. Workshop on the Analysis of Software Product Lines (ASPL), September 2008.

A feature model defines each product in a product-line by a unique combination of features. Feature compatibilities are expressed as constraints in feature models and may be contradictory. We suggest a run-time approach to expose contradictions in feature models when they are uncovered. However, the emphasis of this paper is to explore the possibility of finding contradictions statically using model checking and an incremental consistency algorithm.

Click here for the feature models used in this work..

constraints, feature models

D. Batory and E. Boerger. Modularizing Theorems for Software Product Lines: The Jbook Case Study. Journal of Universal Computer Science (JUCS). Keynote at Abstract State Machine (ASM) Workshop, June 2007.

A goal of software product lines is the economical assembly of programs in a family of programs. In this paper, we explore how theorems about program properties may be integrated into feature-based development of software product lines. As a case study, we analyze an existing Java/JVM compilation correctness proof for defining, interpreting, compiling, and executing bytecode for the Java language. We show how features modularize program source, theorem statements and their proofs. By composing features, the source code, theorem statements and proofs for a program are assembled. The investigation in this paper reveals a striking similarity of the refinement concepts used in Abstract State Machines (ASM) based system development and Feature-Oriented Programming (FOP) of software product lines. We suggest to exploit this observation for a fruitful interaction of researchers in the two communities.

verification, semantics, boerger

G. Freeman, D. Batory, and G. Lavender. Lifting Transformational Models of Product Lines: A Case Study.. Int. Conference on Model Transformations (ICMT), 2008. ( see journal version.)

Model driven development (MDD) of software product lines (SPLs) merges two increasing important paradigms that synthesize programs by transformation. MDD creates programs by transforming models, and SPLs elaborate programs by applying transformations called features. In this paper, we present the design and implementation of a transformational model of a product line of scalar vector graphics and JavaScript applications. We explain how we simplified our implementation by lifting selected features and their compositions from our original product line (whose implementations were complex) to features and their compositions of another product line (whose specifications were simple). We used operators to map higher-level features and their compositions to their lower-level counterparts. Doing so exposed commuting relationships among transformations (features) in both product lines that helped validate our model and implementation.

higher-order transformations, MDD, MDE, MDA

S. Apel and D. Batory. How AspectJ is Used: An Analysis of Eleven AspectJ Programs.. TR MIP-0801, Dept. Informatics and Mathematics, University of Passau, Germany, April 2008.

While it is well-known that crosscutting concerns occur in many software projects, little is known on how aspect-oriented program- ming and in particular AspectJ have been used. In this paper, we analyze eleven AspectJ programs by different authors to answer the questions: which mechanisms are used, to what extent, and for what purpose. We found the code of these programs to be on average 86% objectoriented, 12% basic AspectJ mechanisms (introductions and method extensions), and 2% advanced AspectJ mechanisms (homogeneous advice or advanced dynamic advice). There is one class of crosscutting concerns which is mostly concerned with introductions and method extensions that matches this result well: collaborations. These results and our discussions with program authors indicate the bulk of coding activities was implementing collaborations. Several studies and researchers suggest that languages explicitly supporting collaborations are better suited than aspects a la AspectJ for this task.

Journal of Object Technology.

aspects

C.H.P. Kim, K. Czarnecki, and D. Batory. On-Demand Materialization of Aspects for Application Development.. Software Engineering Properties of Languages and Aspect Technologies (SPLAT), 2008.

Framework-based application development requires applications to be implemented according to rules, recipes and conventions that are documented or assumed by the framework’s Application Programming Interface (API), thereby giving rise to systematic usage patterns. While such usage patterns can be captured cleanly using conventional aspects, their variations, which arise in exceptional conditions, typically cannot be. In this position paper, we propose material- izable aspects as a solution to this problem. A materializable aspect behaves as a normal aspect for default joinpoints, but for exceptional joinpoints, it turns into a program transformation and analysis mechanism, with the IDE transforming the advice in-place and allowing the application developer to modify the materialized advice within the semantics of the aspect. We demonstrate the need for materializable aspects through a preliminary study of open-source SWT-based applications and describe our initial implementation of materializable aspects in Eclipse.

aspects
D. Batory and D. Smith. Finite Map Spaces and Quarks: Algebras of Program Structure. University of Texas at Austin, Dept. of Computer Sciences, TR-07-66.

We present two algebras that unify the disparate software composition models of Feature-Oriented Programming and Aspect-Oriented Programming. In one algebra, a finite map space underlies program synthesis, where adding finite maps and modifying their contents are fundamental operations. A second and more general algebra uses quarks, a construct that represents both expressions and their transformations. Special cases of our algebras correspond to existing systems and languages, and thus can serve as a foundation for next-generation tools and languages for feature-based program synthesis.

computational design
S. Apel, C. Lengauer, D. Batory, B. Moeller, and C. Kaestner. An Algebra for Feature-Oriented Software Development. Fakultät für Informatik und Mathematik, Universität Passau, MIP-0706, 2007

Feature-Oriented Software Development (FOSD) provides a multitude of formalisms, methods, languages, and tools for building variable, customizable, and extensible software. Along different lines of research different ideas of what a feature is have been developed. Although the existing approaches have similar goals, their representations and formalizations have not been integrated so far into a common framework. We present a feature algebra as a foundation of FOSD. The algebra captures the key ideas and provides a common ground for current and future research in this field, in which also alternative options can be explored

computational design, kaestner, moeller
E. Uzuncaova, D. Garcia, S. Khurshid, and D. Batory. A Specification-Based Approach to Testing Software Product Lines (Poster Paper) . SIGSOFT FSE, 2007

This paper presents a specification-based approach for systematic testing of products from a software product line. Our approach uses specifications given as formulas in Alloy, a first-order logic based on relations. Alloy formulas can be checked for satisfiability using the Alloy Analyzer. The fully automatic analyzer, given an Alloy formula and a scope, i.e., a bound on the universe of discourse, searches for an instance, i.e., a valuation to the relations in the formula such that it evaluates to true. The analyzer translates an Alloy formula (for the given scope) to a propositional formula and finds an instance using an off-the-shelf SAT solver. The use of an enumerating solver enables systematic test generation. We have developed a prototype based on the AHEAD theory. The prototype uses the recently developed Kodkod model finding engine of the Alloy Analyzer. We illustrate our approach using a data structure product line

testing, geodesic
S. Thaker, D. Batory, D. Kitchin, and W. Cook. Safe Composition of Product Lines Generative Programming and Component Engineering (GPCE), October 2007

Programs of a software product line can be synthesized by composing modules that implement features. Besides high-level domain constraints that govern the compatibility of features, there are also low-level implementation constraints: a feature module can reference elements that are defined in other feature modules. Safe composition is the guarantee that all programs in a product line are type safe: i.e., absent of references to undefined elements (such as classes, methods, and variables). We show how safe composition properties can be verified for AHEAD product lines using feature models and SAT solvers

constraints, recommended safe composition, verification
C. Kaestner, S. Apel, and D. Batory. A Case Study Implementing Features Using AspectJ. Software Product Line Conference (SPLC), September 2007

Software product lines aim to create highly configurable programs from a set of features. Common belief and recent studies suggest that aspects are well-suited for implementing features. We evaluate the suitability of AspectJ with respect to this task by a case study that refactors the embedded database system Berkeley DB into 38 features. Contrary to our initial expectations, the results were not encouraging. As the number of aspects in a feature grows, there is a noticeable decrease in code readability and maintainability. Most of the unique and powerful features of AspectJ were not needed. We document where AspectJ is unsuitable for implementing features of refactored legacy applications and explain why

aspects, kaestner
S. Trujillo. Feature Oriented Model Driven Product Lines . Ph.D. Dissertation, Department of Computer Sciences, The University of the Basque Country, Spain, March 2007.

Model Driven Development (MDD) is an emerging paradigm for software construction that uses models to specify programs, and model transformations to synthesize executables. Feature Oriented Programming (FOP) is a paradigm for software product lines where programs are synthesized by composing features. Feature Oriented Model Driven Development (FOMDD) is a blend of FOP and MDD that shows how programs in a software product line can be synthesized in an MDD way by composing models from features, and then transforming these models into executables. A case study on a product line of portlets, which are components of web portals, is used to illustrate FOMDD. This case reveals mathematical properties (i.e., commuting diagrams) of portlet synthesis that helped us to validate the correctness of our approach (abstractions, tools and specifications), as well as optimize portlet synthesis. These properties expose the nature of metaprograms for program synthesis (i.e., synthesis is a metaprogram combining primitive operations that form a geometry). We exploit this to synthesize metaprograms, which when executed, will synthesize a target program of a product-line. Specifically, we elaborate on the generation of metaprograms (not programs) from abstract specifications. This is the core of GROVE: the GeneRative metaprOgramming for Variable structurE approach. Variability of synthesis is also considered. Nonetheless, the ultimate envision is a structural theory behind program synthesis.

student, MDD, MDE, MDA, geodesic, higher-order transformations, kubes, origami, categories, commuting diagrams
C. Kaestner. Aspect-Oriented Refactoring of Berkeley DB . M.Sc. Thesis, Department of Technical and Business Information Systems, University of Magdeburg, Germany, March 2007.

Aspect-oriented programming has been repeatedly suggested to implement software product lines (SPL). In this thesis we create a large case study by refactoring the open source project Berkeley DB into an SPL with 38 features using AspectJ. We focus on expressibility, readability, the role of bounded quantification, and feature interactions. Contrary to our initial expectations, the results were not encouraging. We document problems encountered and suggest solutions and directions for further research.

refactoring, student, aspects, kaestner
S. Apel. The Role of Features and Aspects in Software Development . Ph.D. Dissertation, Department of Technical and Business Information Systems, University of Magdeburg, Germany, March 2007.

Feature-Oriented Programming (FOP) and Aspect-Oriented Programming (AOP) are complementary technologies. Though they aim at crosscutting modularity, they do so in different ways. We observed that FOP and AOP can be combined to overcome their individual limitations. Consequently, we propose Aspectual Feature Modules (AFMs), a representative approach that unifies AOP and FOP. From this symbiosis we derive the novel notion of Aspect Refinement (AR) that integrates aspects into the stepwise development philosophy of FOP. We use AFMs and AR in a non-trivial case study to create a product line of overlay networks. We also present a set of guidelines to assist programmers in how and when to use FOP and AOP techniques for implementing product lines in a stepwise and generative manner. Finally, we answer the question of how FOP and AOP-related implementation techniques are used today by analyzing a representative set of AspectJ programs of different sizes. We observe that aspects are used frequently for implementation problems that are closely related to FOP. We discuss why this is not surprising.

student, aspects
S. Trujillo, D. Batory, and O. Diaz Feature Oriented Model Driven Development: A Case Study for Portlets. International Conference on Software Engineering (ICSE), 2007

Model Driven Development (MDD) is an emerging paradigm for software construction that uses models to specify programs, and model transformations to synthesize executables. Feature Oriented Programming (FOP) is a paradigm for software product lines where programs are synthesized by composing features. Feature Oriented Model Driven Development (FOMDD) is a blend of FOP and MDD that shows how products in a software product line can be synthesized in an MDD way by composing models from features, and then transforming these models into executables. We present a case study of FOMDD on a product line of portlets, which are components of web portals. We reveal mathematical properties of portlet synthesis that helped us to validate the correctness of our abstractions, tools, and specifications, as well as optimize portlet synthesis

MDD, MDE, MDA, categories, commuting diagrams
D. Batory. Program Refactorings, Program Synthesis, and Model-Driven Design . Keynote at European Joint Conferences on Theory and Practice of Software (ETAPS) Compiler Construction Conference, April 2007.

Program refactoring, feature-based and aspect-oriented software synthesis, and model-driven development are disjoint research areas. However, they are all architectural metaprogramming technologies as they treat programs as values and use functions (a.k.a. transformations) to map programs to other programs. In this paper, I explore their underlying connections by reviewing recent advances in each area from an architectural metaprogramming perspective. I conjecture how these areas can converge and outline a theory that may unify them.

MDD, MDE, MDA, categories, commuting diagrams, refactoring, keynote, recommended theory
D. Batory. From Implementation to Theory in Product Synthesis . Keynote at Principles of Programming Languages (POPL), January 2007.

Future software development will rely on product synthesis, i.e., the synthesis of code and non-code artifacts for a target component or application. Prior work on feature-based product synthesis can be unified by applying elementary ideas from category theory. Doing so reveals (a) important and previously unrecognized properties that product synthesis tools must satisfy, and (b) non-obvious generalizations of current techniques that will guide future research efforts in automated product development.

Published version of this paper..

MDD, MDE, MDA, categories, commuting diagrams, computational design, geodesic
R. Lopez-Herrejon and D. Batory, Modeling Features in Aspect-Based Product Lines with Use Case Slices: An Exploratory Case Study." Aspect Oriented Modeling Workshop at Model Driven Engineering Languages and Systems (MODELS). Genoa, Italy 2006. Also, in Models in Software Engineering. Workshops and Symposia at MoDELS 2006, Genoa, Italy, October 1-6, 2006, Reports and Revised Selected Papers. Series: Lecture Notes in Computer Science, Vol. 4364 Sublibrary: Programming and Software Engineering 2007

A significant number of techniques that exploit aspects in software design have been proposed in recent years. One technique is use case slices by Jacobson and Ng, that builds upon the success of use cases as a common modeling practice. A use case slice modularizes the implementation of a use case and typically consists of a set of aspects, classes, and interfaces. Work on Feature Oriented Programming (FOP) has shown how features, increments in program functionality, can be modularized and algebraically modeled for the synthesis of product lines. When AspectJ is used in FOP, the structure of feature modules resembles that of use case slices. In this paper, we explore the relations between use case slices modeling and FOP program synthesis and describe their potential synergy for modeling and synthesizing aspect-based product lines

aspects, UML, award
S. Thaker. Design and Analysis of Multidimensional Program Structures . M.Sc. Thesis, December 2006, The Department of Computer Sciences, The University of Texas at Austin.

Software design is an art. Just as understanding of the structure of atomic elements is critical to the natural sciences, it too is fundamental to the science behind software engineering. Feature-Oriented Programming and compositional programming embody ideas of software decomposition structure. Within feature-decomposition, multidimensional structures are of particular interest. Multidimensional structures capture a higher-order structural relationship within conventional hierarchal decomposition. They simplify a complex decomposition's representation, understanding, and compositional specification. In this thesis we first overview multidimensional structures. We then identify two critical properties a decomposition must satisfy and devise algorithms for efficiently verifying them. The Safe Composition property guarantees lack of type-errors at composition-time. The Orthogonality property requires a multidimensional decomposition to be identical along all dimensions. In identifying and verifying key properties we have also demonstrated how a structural model can simplify our understanding and applicability of essential principles. When the fundamental structure is well-understood, software design is amenable to automation.

student, analysis, origami, safe composition, verification
D. Batory, D. Benavides, and A. Ruiz-Cortes. Automated Analyses of Feature Models: Challenges Ahead. CACM, Special Issue on Software Product Lines, December 2006

This paper describes recent advances in formalizing feature models and the challenges ahead in automating product specification and design

feature models
S. Apel and J. Liu. "On the Notion of Functional Aspects in Aspect-Oriented Refactoring". Keynote at Workshop on Aspects, Dependencies, and Interactions (ADI), Nantes, France, 2006

In this paper, we examine the notion of functional aspects in context of aspect-oriented refactoring. Treating aspects as functions reduces the potential interactions between aspects significantly. We propose a simple mathematical model that incorporates fundamental properties of aspects and their interactions. Using this model, we show that with regard to refactoring functional aspects are as flexible as traditional aspects, but with reduced program complexity

refactoring, aspects, keynote
S. Apel, D. Batory, and M. Rosenmueller "On the Structure of Crosscutting Concerns:Using Aspects or Collaborations?", 1st Workshop on Aspect-Oriented Product Line Engineering (AOPLE), October 2006

While it is well known that crosscutting concerns occur in many software projects, little is known about the inherent properties of these concerns nor how aspects (should) deal with them. We present a framework for classifying the structural properties of crosscutting concerns into (1) those that benefit from AOP and (2) those that should be implemented by OOP mechanisms. Further, we propose a set of code metrics to perform this classification. Applying them to a case study is a first to step toward revealing the current practice of AOP

aspects, rosenmueller
G.T. Leavens, J-R Abrial, D. Batory, M. Butler, A. Coglio, K. Fisler, E. Hehner, C. Jones, D. Miller, S. Peyton-Jones, M. Sitaraman, D.R. Smith, and A. Stump, "Roadmap for Enhanced Languages and Methods to Aid Verification", Generative Programming and Component Engineering (GPCE), October 2006

This road map describes ways that researchers in four areas - specification languages, program generation, correctness by construction, and programming languages - might help further the goal of verified software. It also describes what advances the "verified softwar" grand challenge might anticipate or demand from work in these areas. That is, the roadmap is intended to help foster collaboration between the grand challenge and these research areas. A common goal for research in these areas is to establish language designs and tool architectures that would allow multiple annotations and tools to be used on a single program. In the long term, researchers could try to unify these annotations and integrate such tools

verification, semantics
S. Apel and D. Batory. When to Use Features and Aspects? A Case Study ,Generative Programming and Component Engineering (GPCE), October 2006

Aspect-Oriented Programming (AOP) and Feature-Oriented Programming (FOP) are complementary technologies that can be combined to overcome their individual limitations. Aspectual Mixin Layers (AML) is a representative approach that unifies AOP and FOP. We use AML in a non-trivial case study to create a product line of overlay networks. We also present a set of guidelines to assist programmers in how and when to use AOP and FOP techniques for implementing product lines in a stepwise and generative manner

aspects
S. Trujillo, D. Batory, and O. Diaz. Feature Refactoring a Multi-Representation Program into a Product Line , Generative Programming and Component Engineering (GPCE), October 2006

Feature refactoring is the process of decomposing a program into a set of modules, called features, that encapsulate increments in program functionality. Different compositions of features yield different programs. As programs are defined using multiple representations, such as code, makefiles, and documentation, feature refactoring requires all representations to be factored. Thus, composing features produces consistent representations of code, makefiles, documentation, etc. for a target program. We present a case study of feature refactoring a substantial tool suite that uses multiple representations. We describe the key technical problems encountered, and sketch the tool support needed for simplifying such refactorings in the future

refactoring
M. Grechanik. Design and Analysis of Interoperating Components . Ph.D. dissertation, September 2006, The Department of Computer Sciences, The University of Texas at Austin.

Components are modular units (e.g., objects, modules, or programs) that interact by exchanging data. Two or more components interoperate when they exchange information. We propose an abstraction in which foreign objects (i.e., objects that are not defined in a host programming language) are abstracted as graphs and abstract operations access and manipulate them. Using this abstraction, we build a framework called Reification Object-Oriented Framework (ROOF) to reduce the multiplicity of platform API calls to a small set of operations on foreign objects that are common to all platforms. Based on our abstraction, we designed Foreign Object REification Language (FOREL), an extension for object-oriented languages whose type checking coupled with a conservative static analysis mechanism reports potential errors when referencing foreign components. Also, we design and build a tool for legacy code called a Verifier for Interoperating cOmponents for finding Logic fAults (Viola) that finds errors in components exchanging XML data.

student, analysis, verification
R.E. Lopez-Herrejon. Understanding Feature Modularity . Ph.D. dissertation, September 2006, The Department of Computer Sciences, The University of Texas at Austin.

An undesirable consequence is that developers must lower their abstractions from features to those provided by the underlying implementation languages, a process that is far from simple let alone amenable to significant automation. The conceptual gap created between feature abstractions and their modularization hinders program understanding and product line development. The root of the problem is the fact that feature modularity is not well understood and thus not well supported in conventional programming languages, modularization mechanisms, and design techniques. In this dissertation, we explore language and modularity support for features founded on an algebraic model geared for program synthesis. Our model integrates ideas from collaboration-based designs, mixin layers, aspect oriented programming, multidimensional separation of concerns, and generative programming. We assess our model with an implementation of a non-trivial product line case study, and evaluate feature support in emerging modularization technologies.

student, origami, aspects
D. Batory. Multi-Level Models in Model Driven Development, Product-Lines, and Metaprogramming , IBM Systems Journal, Volume 45, Number 3, 2006

Model Driven Development (MDD) aims to raise the level of abstraction in program specification and increase automation in program development. These are also the goals of product-lines (creating a family of related programs) and metaprogramming (programming as a computation). We show that the confluence of MDD, product-lines, and metaprogramming exposes a multilevel paradigm of program development, and further, we can use OO design techniques to represent programs, the metaprograms that produced these programs, and the meta-meta programs that produced these metaprograms, recursively. The paradigm is based on a small number of simple and well-known ideas, scales to the synthesis of applications of substantial size, and helps clarify concepts of MDD

MDD, MDE, MDA
J. Liu, D. Batory, and C. Lengauer. "Feature Oriented Refactoring of Legacy Applications", International Conference on Software Engineering 2006 (ICSE), Shanghai, China.

Feature oriented refactoring (FOR) is the process of decomposing a program into features, where a feature is an increment in program functionality. We develop a theory of FOR that relates code refactoring to algebraic factoring. Our theory explains relationships between features and their implementing modules, and why features in different programs of a product-line can have different implementations. We describe a tool and refactoring methodology based on our theory, and present a validating case study

refactoring, computational design, optional feature problem

R. Lopez-Herrejon, D. Batory, and C. Lengauer. "A Disciplined Approach to Aspect Composition", ACM SIGPLAN 2006 Symposium on Partial Evaluation and Program Manipulation (PEPM), January 2006, Charleston, SC

Aspect-oriented programming is a promising paradigm that challenges traditional notions of program modularity. Despite its increasing acceptance, aspects have been documented to suffer limited reuse, hard to predict behavior, and difficult modular reasoning. We develop an algebraic model that relates aspects to program transformations and uncovers aspect composition as a significant source of the problems mentioned. We propose an alternative model of composition that eliminates these problems, preserves the power of aspects, and lays an algebraic foundation on which to build and understand AOP tools

aspects, computational design
M. Grechanik, D.E. Perry, and D. Batory. A Scalable Security Mechanism For Component-Based Systems, Int. Conf. COTS-Based Software Systems, April 2006, Orlando FL

Security, scalability, and performance are critical for large-scale component-based applications. Weaving security solutions into the fabric of component-based architectures often worsens the scalability and performance of the resulting system. In this paper we analyze the sources of non-scalability and conduct an empirical study that shows that close to 80% of interactions between components and their clients in different commercial systems occur within protected security boundaries. Based on these findings we propose a novel scalable security mechanism for component based systems called Component Adaptive Scalable Secure Infrastructure Architecture (CASSIA). CASSIA utilizes the topology of the security boundaries and patterns of interactions among components to achieve noticeable improvements in scalability and performance for component-based applications. We conduct a case study that confirms the scalability of CASSIA, and propose a Secure COmponent Protocol (SCOP) that incorporates our mechanism into a component infrastructure

. D. BatoryA Tutorial on Feature Oriented Programming and the AHEAD Tool Suite, Proc. Summer School on Generative and Transformation Techniques in Software Engineering, Braga, Portugal, R. Laemmel, J. Saraiva, and J. Visser, ed., Vol 4143, Lecture Notes in Computer Science, Springer-Verlag, 2006

Feature oriented programming (FOP) is the study of feature modularity and its use in program synthesis. AHEAD is a theory of FOP that is based on a fundamental concept of generative programming that functions map programs. This enables the design of programs to be expressed compositionally as algebraic expressions, which are suited for automated analysis, manipulation, and program synthesis. This paper is a tutorial on FOP and AHEAD. We review AHEAD's theory and the tool set that implements it

E. Jung, C. Kapoor, and D. Batory. "Automatic Code Generation for Actuator Interfacing from a Declarative Specification", International Conference on Robotics and Systems (IROS), August 2005, Edmonton, Canada

Common software design practices use object-oriented (OO) frameworks that structure software in terms of objects, classes, and package; designers then create programs by inheritance and composition of classes and objects. Operational Software Components for Advanced Robotics (OSCAR) is one such framework for robot control software with abstractions for generalized kinematics, dynamics, performance criteria, decision making, and hardware interfacing. Even with OSCAR, writing new programs still requires a significant amount of manual labor. Feature-Oriented Programming (FOP) is method for software design that models and specifies programs in terms of features, where a feature encapsulates the common design decisions that occur in a domain. A set of features then forms a domain model for a Product Line Architecture. Product variants in this product line can then be generated from a declarative specification. FOP and related technologies are emerging software engineering techniques for automatically generating programs. Our research applies FOP to robot controller software. As an example, the domain of hardware interfacing is analyzed and 41 features identified. A GUI for specifying and generating programs is presented as well. Analysis of features shows 200 possible different programs could be generated

D. Batory. "Feature Models, Grammars, and Propositional Formulas", Software Product Line Conference (SPLC) September 2005.
This paper has won the SPLC Test of Time Award 2017.

Feature models are used to specify members of a product-line. Despite years of progress, contemporary tools provide limited support for feature constraints and offer little or no support for debugging feature models. We integrate prior results to connect feature models, grammars, and propositional formulas. This connection allows arbitrary propositional constraints to be defined among features and enables off-the-shelf satisfiability solvers to debug feature models. We also show how our ideas can generalize recent results on the staged configuration of feature models

award, constraints, feature models, recommended design rule checking
J. Liu, D. Batory, and S. Nedunuri. Modeling Interactions in Feature Oriented Designs, International Conference on Feature Interactions (ICFI), June 2005

Feature Oriented Programming (FOP) is a general theory of software development where programs are assembled by composing feature modules. A feature X interacts structurally with another feature Y by changing Y's source code. We advance FOP by proposing an algebraic theory of structural feature interactions that models feature interactions as derivatives. We use our theory to show how a legacy Java application can be refactored into a feature-based design

feature interactions, derivatives, optional feature problem
R. Lopez-Herrejon, D. Batory, and W. Cook. "Evaluating Support for Features in Advanced Modularization Technologies", European Conference on Object-Oriented Programming (ECOOP), July 2005

A software product-line is a family of related programs. Each program is defined by a unique combination of features, where a feature is an incremental change in program functionality. Modularizing features is difficult, as feature-specific code often cuts across class boundaries. New modularization technologies have been proposed in recent years, but their support for feature modules has not been thoroughly examined. In this paper, we propose a variant of the expression problem as a canonical problem in product-line design. The problem reveals a set of technology-independent properties that feature modules should exhibit. We use these properties to evaluate five technologies: AspectJ, Hyper/J, Jiazzi, Scala, and AHEAD. The results suggest an abstract model of feature composition that is technology-independent and that relates compositional reasoning with algebraic reasoning

origami, kubes, tensors, aspects
M. Grechanik, D.E. Perry, and D. Batory. Using AOP to Monitor and Administer Software for Grid Computing Environments, Int. Computer Software and Applications Conference (COMPSAC), Edinburgh, Scotland, July 2005

Monitoring is a task of collecting measurements that reflect the state of a system. Administration is a collection of tasks for control and manipulation of computer systems. Monitoring and Administering computer ResourceS (MARS) in a distributed grid computing environment (i.e. a distributed environment for coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations) is an important, expensive, and critical task. We present a novel solution based on applying crosscuts using binary rewriters and an event-based model that allows developers to create non-trivial MARS programs easily and uniformly. Our approach converts low-level API resource calls into system-wide events that MARS programs can monitor. This is accomplished by introducing advice that contains event-generating code at join points in programs that represent computer resources. We categorize low-level resource APIs by imposing a transactional metaphor to simplify the complexity of interactions between resources and to enable reasoning about MARS programs. We report both a case study and simulation that supports the viability of our approach.

R. Lopez-Herrejon and D. Batory. Improving Incremental Development in Aspectj by Bounding Quantification, Software Engineering Properties and Languages for Aspect Technologies (SPLAT), March 2005

Incremental software development is a process of building complex programs from simple ones by successively adding programmatic details. It is an effective and common design practice that helps control program complexity. However, incrementally building software using aspects can be more challenging than using traditional modules. AspectJ quantification mechanisms do not distinguish different developmental stages, and thus pointcuts can capture join points from later stages that they originally were not intended to advice. In this paper we present an algebraic model to describe aspects and their composition in AspectJ. We show that the way AspectJ's composes aspects plays a significant role in this problem. We propose an alternative model to compose aspects that improves the support for incremental development. It bounds the scope of quantification and still preserves the same power of AspectJ. We also show how bounded quantification contributes to aspect reuse

aspects, computational design
J. Blair and D. Batory. A Comparison of Generative Approaches: XVCL and GenVoca, Technical Report December 2004

We report one of the first comparative studies on two mature approaches to generative programming: XVCL and GenVoca. XVCL is the latest incarnation of Bassett's frames; GenVoca melds feature modularity with compositional programming. Both approaches explicitly rest on a pair of ideas: (1) parameterized functions return source code and (2) composing such functions synthesizes applications. Although their foundations are identical, XVCL and GenVoca have very different design philosophies. These differences raise an interesting debate on what design methodologies and programming constructs should be used in writing generative programs.
Code for this paper is here

D. Batory A Science of Software Design . Keynote at International Conference on Algebraic Methodology And Software Technology (AMAST) 2004.

Underlying large-scale software design and program synthesis are simple and powerful algebraic models. In this paper, I review the elementary ideas upon which these algebras rest and argue that they define the basis for a science of software design.

computational design
D. Batory Program Comprehension in Generative Programming: A History of Grand Challenges . Keynote at International Workshop on Program Comprehension (IWPC) 2004.

The communities of Generative Programming (GP) and Program Comprehension (PC) look at similar problems: GP derives a program from a specification, PC derives a specification from a program. A basic difference between the two is GP’s use of specific knowledge representations and mental models that are essential for program synthesis. In this paper, I present a historical review of the Grand Challenges, results, and outlook for GP as they pertain to PC.

keynote
J. Liu and D. Batory. " Automatic Remodularization and Optimized Synthesis of Product-Families, Generative Programming and Component Engineering (GPCE), October 2004

A product-family is a suite of integrated tools that share a common infrastructure. Program synthesis of individual tools can replicate common code, leading to unnecessarily large executables and longer build times. In this paper, we present remodularization techniques that optimize the synthesis of product-families. We show how tools can be automatically remodularized so that shared files are identified and extracted into a common package,and how isomorphic class inheritance hierarchies can be merged into a single hierarchy. Doing so substantially reduces the cost of program synthesis, product-family build times, and executable sizes. We present a case study of a product-family with five tools totalling over 170K LOC, where our optimizations reduce archive size and build times by 40%

recommended optimization
D. Batory, J.N. Sarvela, and A. Rauschmayer. Scaling Step-Wise Refinement , IEEE Transactions on Software Engineering (IEEE TSE), June 2004. This is a journal version of our 2002 ICSE paper

Step-wise refinement is a powerful paradigm for developing a complex program from a simple program by adding features incrementally. We present the AHEAD (Algebraic Hierarchical Equations for Application Design) model that shows how step-wise refinement scales to synthesize multiple programs and multiple noncode representations. AHEAD shows that software can have an elegant, hierarchical mathematical structure that is expressible as nested sets of equations. We review a tool set that supports AHEAD. As a demonstration of its viability, we have bootstrapped AHEAD tools from equational specifications, refining Java and non-Java artifacts automatically; a task that was accomplished only by ad hoc means previously

computational design, award, recommended theory
M. Grechanik, D. Batory, and D.E. PerryDesign of Large-Scale Polylingual Systems. International Conference on Software Engineering (ICSE), Edinburgh, Scotland, UK, May 2004

Building systems from existing applications written in two or more languages is common practice. Such systems are polylingual. Polylingual systems are relatively easy to build when the number of APIs needed to achieve language interoperability is small. However, when the number of distinct APIs become large, maintaining and evolving polylingual systems becomes a notoriously difficult task. In this paper, we present a simple, practical, and effective way to develop, maintain, and evolve large-scale polylingual systems. Our approach relies on recursive type systems whose instances can be manipulated by reflection. Foreign objects (i.e. objects that are not defined in a host programming language) are abstracted as graphs and path expressions are used for accessing and manipulating data. Path expressions are implemented by type reification which turns foreign type instances into first-class objects and enabling access to and manipulation of them in a host programming language. Doing this results in multiple benefits, including coding simplicity and uniformity that we demonstrate in a complex commercial project

. D. Batory. Feature-Oriented Programming and the AHEAD Tool Suite, International Conference on Software Engineering (ICSE), Edinburgh, Scotland, 2004

Feature Oriented Programming (FOP) is an emerging paradigm for application synthesis, analysis, and optimization. A target application is specified declaratively as a set of features, like many consumer products (e.g., personal computers, automobiles). FOP technology translates such declarative specifications into efficient programs. AHEAD is a model of FOP that is based on step-wise refinement, which advocates that complex programs can be synthesized from simple programs by incrementally adding features. The AHEAD Tool Suite (ATS) supports program development in AHEAD. AHEAD and ATS are among the most advanced models/tools for large-scale program synthesis

M. Grechanik, D.E. Perry, and D. BatoryReengineering Large-Scale Polylingual Systems. International Workshop on Incorporating COTS into Software Systems: Tools and Techniques (IWICSS). co-located with the 3rd International Conference on COTS-Based Software Systems (ICCBSS), Redondo Beach, CA, February 2004

Building systems from existing applications written in two or more languages is common practice. Such systems are polylingual. Polylingual systems are relatively easy to build when the number of APIs needed to achieve language interoperability is small. However, when the number of distinct APIs become large, maintaining and evolving them becomes a notoriously difficult task. We present a practical and effective process to reengineer large-scale polylingual systems. We offer a programming model that is based on the uniform abstraction of polylingual systems as graphs and the use of path expressions for traversing and manipulating data. This model enables us to achieve multiple benefits, including coding simplicity and uniformity (where neither was present before) that facilitate further reverse engineering. By performing control and data flow analyses of polylingual systems we infer the schemas used by all participating programs and the actions performed by each program on others. Finally, we describe a tool called FORTRESS that automates our reverse engineering process. The contribution of this paper is a process that allows programmers to reverse engineer foreign type systems and their instances semiautomatically at the highest level of design. We know of no other approach with comparable benefits

D. Batory, J. Liu, J.N. Sarvela. Refinements and Multi-Dimensional Separation of Concerns, ACM SIGSOFT FSE 2003

Step-wise refinement (SWR) asserts that complex programs can be derived from simple programs by progressively adding features. The length of a program specification is the number of features that the program has. Critical to the scalability of SWR are multi-dimensional models that separate orthogonal feature sets. Let n be the dimensionality of a model and k be the number of features along a dimension. We show program specifications that could be O(k^**n) features long have short and easy-to-understand specifications of length O(kn) when multi-dimensional models are used. We present new examples of multidimensional models: a micro example of a product-line (whose programs are 30 lines of code) and isomorphic macro examples (whose programs exceed 30K lines of code). Our work provides strong evidence that SWR scales to synthesis of large systems

recommended origami, kubes, tensors
D. Batory, J.N. Sarvela, and A. Rauschmayer. Scaling Step-Wise Refinement, International Conference on Software Engineering (ICSE), Portland, Oregon, 2003

Step-wise refinement is a powerful paradigm for developing a complex program from a simple program by adding features incrementally. We present the AHEAD (Algebraic Hierarchical Equations for Application Design) model that shows how step-wise refinement scales to synthesize multiple programs and multiple non-code representations. AHEAD shows that software can have an elegant, hierarchical mathematical structure that is expressible as nested sets of equations. We review a tool set that supports AHEAD. As a demonstration of it's viability, we have bootstrapped AHEAD tools solely from equational specifications, generating Java and non-Java artifacts automatically, a task that was accomplished only by ad hoc means previously

computational design
D. Batory. The Road to Utopia: A Future for Generative Programming . Keynote at Dagstuhl Seminar on Domain-Specific Program Generation, March 2003.

The future of software engineering lies in automation, and will exploit the combined strengths of generative programming, domain-specific languages, and automatic programming. While each of these areas is still in its infancy, a spectacularly successful example of their combination was realized twenty-five years ago: relational query optimization. In this paper, I chart the successes and mindset used by database researchers to generate efficient query processing programs automatically. I argue that the road that they have so successfully followed is the same road that the generative programming, domain-specific languages, and automatic programming communities are now traversing.

keynote
M. Grechanik, D.E. Perry, and D. Batory. An Approach to Evolving Database Dependent Systems . International Workshop on Principles of Software Evolution, Orlando, Florida, May 2002.

It is common in client/server architectures for components to have SQL statements embedded in their source code. Components submit queries to relational databases using such standards as Universal Data Access (UDA) and Open Database Connectivity (ODBC). The API that implements these standards is complex and requires the embedding of SQL statements in the language that is used to write the components. Such programming practices are widespread and result in increased complexity in maintaining systems. We propose an approach that eliminates the embedding of SQL in programming languages, thereby enabling the automation of important software maintenance tasks.

evolution
D. Batory, R. Lopez-Herrejon, and J-P. Martin. Generating Product-Lines of Product-Families, Automated Software Engineering Conference (ASE), 2002

GenVoca is a methodology and technology for generating product-lines, i.e., building variations of a program. The primitive components from which applications are constructed are refinements or layers, which are modules that implement a feature that many programs of a product-line can share. Unlike conventional components (e.g., COM, CORBA, EJB), a layer encapsulates fragments of multiple classes. Sets of fully-formed classes can be produced by composing layers. Layers are modular, albeit unconventional, building blocks of programs.But what are the building blocks of layers? We argue that facets is an answer. A facet encapsulates fragments of multiple layers, and compositions of facets yields sets of fully-formed layers. Facets arise when refinements scale from producing variants of individual programs to producing variants of multiple, integrated programs, as typified by product families (e.g., MS. Office). We present a mathematical model that explains relationships between layers and facets. We use the model to develop a generator for tools (i.e., product-family) that are used in language-extensible Integrated Development Environments (IDEs).
Example is here Extended draft is here

origami, tensors, kubes, award
D. Batory, C. Johnson, B. MacDonald, and D. von Heeder. Achieving Extensibility Through Product-Lines and Domain-Specific Languages: A Case Study ", ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 11#2, April 2002, 191-214. This is a journal version of our 2000 ICSR paper

This is a case study in the use of product-line architectures (PLAs) and domain-specific languages (DSLs) to design an extensible command-and-control simulator for Army fire support. The reusable components of our PLA are layers or "aspects" whose addition or removal simultaneously impacts the source code of multiple objects in multiple, distributed programs. The complexity of our component specifications is substantially reduced by using a DSL for defining and refining state machines, abstractions that are fundamental to simulators. We present preliminary results that show how our PLA and DSL synergistically produce a more flexible way of implementing state-machine-based simulators than is possible with a pure Java implementation

award
M.Grechanik, D. Batory, and Dewayne E. Perry. Integrating and Reusing GUI-Driven Applications, International Conference on Software Reuse (ICSR), Austin, Texas, April 2002

Graphical User Interface (GUI) Driven Applications (GDAs) are ubiquitous. We present a model and techniques that take closed and monolithic GDAs and integrate them into an open, collaborative environment. The central idea is to objectify the GUI of a GDA, thereby creating an object that enables programmatic control of that GDA. We demonstrate a non-trivial application of these ideas by integrating a standalone internet application with a stand-alone Win32 application, and explain how PDAs (Personal Digital Assistants) can be used to remotely control their combined execution. Further, we explain how Integrated Development Environment (IDEs) may be extended to integrate and reuse GDAs using our approach. We believe our work is unique: we know of no other technology that could have integrated the GDAs of our example

Y.Smaragdakis and D. Batory. Mixin Layers: An Object-Oriented Implementation Technique for Refinements and Collaboration-Based Designs, ACM Transactions on Software Engineering and Methodology (TOSEM), April 2002

A "refinement" is a functionality addition to a software project that can affect multiple dispersed implementation entities (functions, classes, etc.). In this paper, we examine large-scale refinements in terms of a fundamental object-oriented technique called collaboration-based design. We explain how collaborations can be expressed in existing programming languages or be supported with new language constructs (which we have implemented as extensions to the Java language). We present a specific expression of large-scale refinements called mixin layers, and demonstrate how it overcomes the scalability difficulties that plagued prior work. We also show how we used mixin layers as the primary implementation technique for building an extensible Java compiler, JTS

mixin layers
A. Rauschmayer. Probe: Visualizing Algebraic Transformations in the GenBorg Generative Software Environment, Institut fuer Informatik Ludwig Maximilians Universitaet Muenchen, December, 2001

Software is becoming increasingly pervasive and complex. The greatest challenge for the industry today is how to produce more code without compromising quality or cost. GenBorg is a programming paradigm whose answer is to scale the units of reuse from components of individual applications to components of application suites (e. g., MS Office). While traditional approaches trade size of code units for flexibility, GenBorg uses techniques from generative programming and feature-oriented programming in order to provide models that instantiate code and non-code artifacts of customizable application suites. Because developing software is not limited to writing code, but current ways of automating development are, a lot of human energy is wasted for keeping related data such as documentation and formal properties consistent with source code. GenBorg manages to evolve all these artifacts automatically and in parallel. GenBorg's ideas are backed by a theoretical foundation, the GenBorg algebra. The Java application Probe is a first implementation of the GenBorg programming paradigm and uses a simple, file-based data structure for defining GenBorg models. Its graphical user interface makes it easy to navigate through and instantiate from large models

student
D. Batory, D. Brant, M. Gibson, and M. Nolen. ExCIS: An Integration of Domain-Specific Languages and Feature-Oriented Programming, Workshop on New Visions for Software Design and Productivity: Research and Applications, Vanderbilt University, Nashville, Tennessee, December 13-14, 2001

ExCIS is a methodology and tool suite that integrates the technologies of domain-specific languages (DSLs) and feature-oriented programming (FOP). DSLs offer compact specifications of programs in domain-specific notations that are easier to write, maintain, and evolve. FOP raises the level of abstraction in system programming from compositions of code-centric components to compositions of modules that implement individual and largely orthogonal features. ExCIS provides state-of-the-art tools for creating easier-to-specify, easier-to-maintain, and easier-to-change systems. It is being developed for STRICOM to create next-generation simulators for the U.S. Army

R. E. Lopez-Herrejon and D. Batory. A Standard Problem for Evaluating Product-Line Methodologies, Generative and Component-Based Software Engineering (GCSE/GPCE), September 9-13, 2001 Messe Erfurt, Erfurt, Germany

We propose a standard problem to evaluate product-line methodologies. It relies on common knowledge from Computer Science, so that domain-knowledge can be easily acquired, and it is complex enough to expose the fundamental concepts of product-line methodologies. As a reference point, we present a solution to this problem using the GenVoca design methodology. We explain a series of modeling, implementation, and benchmarking issues that we encountered, so that others can understand and compare our solution with theirs

GPL

E.W. Dijkstra. What is Refinement? E.W.D. Memo, August 30, 2001.

This is E.W. Dijkstra's response to the question 'What is a Refinement?

semantics
K. Fisler, S. Krishnamurthi, D. Batory, and J. Liu. A Model Checking Framework for Layered Command and Control Software, Workshop on Engineering Automation for Software Intensive System Integration, Monterrey, California, June 18-22, 2001

Most existing modular model checking techniques betray their hardware roots: they assume that modules compose in parallel. In contrast, layered software systems, which have proven successful in many domains, are really quasi-sequential compositions of parallel compositions. Most such systems demand and inspire new modular verification techniques. This paper presents algorithms that exploit a layered (or feature-based) decomposition to drive verification. Our technique can verify most properties locally within a layer; we also characterize when a global state space construction is unavoidable. This work is motivated by our efforts to verify a military fire simulation and support software system called FSATS

verification, semantics
K. Fisler, S. Krishnamurthi, and D. Batory. Verifying Component-Based Collaboration Designs, 4th ICSE Workshop on Component-Based Software Engineering: Component Certification and System Prediction. Toronto, Ontario, Canada, May 2001

verification, semantics
Y. Smaragdakis and D. Batory. Mixin-Based Programming in C++, Second International Symposium on Generative and Component-Based Software Engineering (GCSE now GPCE), Erfurt, Germany, October 9-12, 2000

Combinations of C++ features, like inheritance, templates, and class nesting, allow for the expression of powerful component patterns. In particular, research has demonstrated that, using C++ mixin classes, one can express layered component-based designs concisely with efficient implementations. In this paper, we discuss pragmatic issues related to component-based programming using C++ mixins. We explain surprising interactions of C++ features and policies that sometimes complicate mixin implementations, while other times enable additional functionality without extra effort

mixin layers
D. Batory, C. Johnson, B. MacDonald, and D. von Heeder., FASTS: An Extensible C4I Simulator for Army Fire Support, Workshop on Product Lines for Command-and-Control Ground Systems at the First International Software Product Line Conference (SPLC) in Denver, Colorado, August 2000
L. Tokuda and D. Batory. Evolving Object-Oriented Designs with Refactorings. Journal of Automated Software Engineering 8, 89-120, 2001. This is an enlarged version of our 1999 ASE paper

Refactorings are behavior-preserving program transformations that automate design evolution in object-oriented applications. Three kinds of design evolution are: schema transformations, design pattern micro-architectures, and the hot-spot-driven-approach. This research shows that all three are automatable with refactorings. A comprehensive list of refactorings for design evolution is provided and an analysis of supported schema trans-formations, design patterns, and hot-spot meta patterns is presented. Further, we evaluate whether refactoring technology can be transferred to the mainstream by restructuring non-trivial C++ applications. The applications that we examine were evolved manually by software engineers. We show that an equivalent evolution could be reproduced significantly faster and cheaper by applying a handful of general-purpose refactorings. In one application, over 14K lines of code were transformed automatically that otherwise would have been coded by hand. Our experiments identify benefits, limitations, and topics of further research related to the transfer of refactoring technology to a production environment

refactoring award
D. Batory. Refinements and Separation of Concerns. Second Workshop on Multi-Dimensional Separation of Concerns, International Conference on Software Engineering (ICSE), Limerick, Ireland, 2000

Today's notions of encapsulation are very restricted - a module or component contains only source code. What we really need is for modules or components to encapsulate not only source code that will be installed when the component is used, but also encapsulate corresponding changes to documentation, formal properties, and performance properties - i.e., changes to the central concerns of software development. The general abstraction that encompasses this broad notion of encapsulation is called a "refinement"

D. Batory, Rich Cardone, and Y. Smaragdakis. Object-Oriented Frameworks and Product-Lines. 1st Software Product-Line Conference (SPLC), Denver, Colorado, August 2000

Frameworks are a common object-oriented code-structuring technique that is used in application product-lines. A framework is a set of abstract classes that embody an abstract design; a framework instance is a set of concrete classes that subclass abstract classes to provide an executable subsystem. Frameworks are designed for reuse: abstract classes encapsulate common code and concrete classes encapsulate instance-specific code. Unfortunately, this delineation of reusable vs. instance-specific code is problematic. Concrete classes of different framework instances can have much in common and there can be variations in abstract classes, all of which lead to unnecessary code replication. In this paper, we show how to overcome these limitations by decomposing frameworks and framework instances into primitive and reusable components. Doing so reduces code replication and creates a component-based product-line of frameworks and framework instances

mixin layers, recommended frameworks
D. Batory, Clay Johnson, Bob MacDonald, and Dale von Heeder. Achieving Extensibility Through Product-Lines and Domain-Specific Languages: A Case Study, International Conference on Software Reuse (ICSR), Vienna, Austria, 2000. Here is a pointer to an expanded version of this paper

This is a case study in the use of product-line architectures (PLAs) and domain-specific languages (DSLs) to design an extensible command-and-control simulator for Army fire support. The reusable components of our PLA are layers or "aspects" whose addition or removal simultaneously impacts the source code of multiple objects in multiple, distributed programs. The complexity of our component specifications is substantially reduced by using a DSL for defining and refining state machines, abstractions that are fundamental to simulators. We present preliminary results that show how our PLA and DSL synergistically produce a more flexible way of implementing state-machine-based simulators than is possible with a pure Java implementation

mixin layers
Y. Smaragdakis and D. Batory. Scoping Constructs for Program Generators ,Generative and Component-Based Software Engineering (GCSE now GPCE), Erfurt, Germany, September 1999

A well-known problem in program generation is scoping. When identifiers (i.e., symbolic names) are used to refer to variables, types, or functions, program generators must ensure that generated identifiers are bound to their intended declarations. This is the standard scoping issue in programming languages, only automatically generated programs can quickly become too complex and maintaining bindings manually is hard. In this paper we present generation scoping: a language mechanism to facilitate the handling of scoping concerns. Generation scoping offers control over identifier scoping beyond the scoping mechanism of the target programming language (i.e., the language in which the generator output is expressed). Generation scoping was originally implemented as an extension of the code template operators in the Intentional Programming platform, under development by Microsoft Research. Subsequently, generation scoping has also been integrated in the JTS language extensibility tools. The capabilities of generation scoping were invaluable in the implementation of two actual software generators: DiSTiL (implemented using the Intentional Programming system), and P3 (implemented using JTS).
P3
Y. Smaragdakis and D. Batory. Application Generators, based on an article in the Software Engineering volume of the Encyclopedia of Electrical and Electronics Engineering, (John Wiley and Sons)
D. Batory and Y. Smaragdakis. Building Product-Lines with Mixin-Layers ,ECOOP 99 Workshop on Product-Line Architectures

A mixin-layer is a building block for assembling applications of a product-line. We explain mixin-layers, their relationship to collaboration-based designs, layered designs, and GenVoca. We also summarize some of the product-lines that we have built using mixin-layers

mixin layers
D. Batory, G. Chen, E. Robertson, and T. Wang. Design Wizards and Visual Programming Environments for GenVoca Generators, IEEE Transactions on Software Engineering (IEEE TSE), May 2000, 441-452. This is a journal version of our 1998 ICSR paper

Domain-specific generators will increasingly rely on graphical languages for declarative specifications of target applications. Such languages will provide front-ends to generators and related tools to produce customized code on demand. Critical to the success of this approach will be domain-specific design wizards, tools that guide users in their selection of components for constructing particular applications. In this paper, we present the P3 ContainerStore graphical language, its generator, and design wizard

P3, recommended optimization, wizards, award
L.A. Tokuda. Evolving Object-Oriented Designs with Refactorings . Ph.D. dissertation, September 1999, The Department of Computer Sciences, The University of Texas at Austin.

Refactorings are behavior-preserving program transformations that automate design evolution in object-oriented applications. Three kinds of design evolution are: schema transformations, the introduction of design pattern micro-architectures, and the hot-spot-driven-approach. This research shows that all three are automatable with refactorings. A comprehensive list of refactorings for design evolution is provided and an analysis of supported schema transformations, design patterns, and hot-spot meta patterns is presented. Further, we evaluate whether refactoring technology can be transferred to the mainstream by restructuring non-trivial C++ applications. The applications that we examine were evolved manually by software engineers. We show that an equivalent evolution could be reproduced significantly faster and cheaper by applying a handful of general-purpose refactorings. In one application, over 14K lines of code were transformed automatically that otherwise would have been coded by hand. Our experiments identify benefits, limitations, and topics of further research related to the transfer of refactoring technology to a production environment.

student refactoring analysis design patterns
Y. Smaragdakis. Implementing Large-Scale Object-Oriented Components . Ph.D. dissertation, September 1999, The Department of Computer Sciences, The University of Texas at Austin.

The complexity of software has driven both researchers and practitioners toward design methodologies that decompose problems into intellectually manageable pieces and that assemble partial products into complete software artifacts. Modularity in design, however, rarely translates into modularity at the implementation level. Hence, an important problem is to provide implementation (i.e., programming language) support for expressing modular designs concisely. This dissertation shows that software can be conveniently modularized using large-scale object-oriented software components. Such large-scale components encapsulate multiple classes but can themselves be viewed as classes, as they support the object-oriented mechanisms of encapsulation and inheritance. This conceptual approach has several concrete applications. First, existing language mechanisms, like a pattern of inheritance, class nesting, and parameterization, can be used to simulate large-scale components called mixin layers. Mixin layers are ideal for implementing certain forms of object-oriented designs and result in simpler and more concise implementations than those possible with previous methodologies. Second, we propose new language constructs to provide better support for component-based programming. These constructs express components cleanly (i.e., without unwanted interactions with other language mechanisms) and address the issue of component type-checking.

mixin layers, student
L. Tokuda and D. Batory. Evolving Object-Oriented Architectures with Refactorings Automated Software Engineering (ASE), October 1999.
This paper has jointly won the 2013 ASE Most Influential Paper Award. (Nov 2013)

This research evaluates whether refactoring technology can make a successful transition to the mainstream by restructuring non-trivial evolving C++ applications. Results reveal that refactorings can safely automate significant architectural changes. In one example, 14K lines of code are transformed

refactoring, award
D. Batory, Y. Smaragdakis, and Lou Coglianese. Architectural Styles as Adaptors Software Architecture, Kluwer Academic Publishers, Patrick Donohoe, ed., 1999

The essence of architectural styles is component communication. In this paper, we try to relate architectural styles to adaptors in the GenVoca model of software construction. GenVoca components are refinements that can have a wide range of implementations, from binaries to rule-sets of program transformation systems. We suggest that abstracting adaptors to refinements allows for program transformation implementations of adaptors that can express complex architectural styles that could not be expressed by other means. Examples from avionics are given
Y. Smaragdakis and D. Batory Implementing Layered Designs with Mixin Layers. European Conference on Object-Oriented Programming, (ECOOP), July 1998

Mixin layers are a technique for implementing layered object-oriented designs (e.g., collaboration-based designs). Mixin layers are similar to abstract subclasses (mixin classes) but scaled to a multiple-class granularity. We describe mixin layers from a programming language viewpoint, discuss checking the consistency of a mixin layer composition, and analyze the language support issues involved
L. Tokuda and D. Batory Automating Three Modes of Evolution for Object-Oriented Software Architectures 5th Conference on Object-Oriented Technologies (COOTS), May 1999

Refactorings are behavior preserving program transformations that can be applied to compilable source code to automate design level changes. This paper demonstrates that three common forms of object-oriented architectural evolution can be automated with refactorings: schema transformations, design pattern microarchitectures, and hot-spot meta patterns
D. Batory Product-Line Architectures . Keynote at Smalltalk und Java in Industrie and Ausbildung, Erfurt, Germany, October 1998.

Today's software design methodologies are aimed at one-of-a-kind applications, designs are expressed in terms of objects and classes, and software must be coded manually. We argue that future software development will be very different and will center around product-line architectures (i.e., designs for families of related applications), refinements (a generalization of today's components), and software plug-and-play (a codeless form of programming).

keynote
D. Batory, Bernie Lofaso, and Y. Smaragdakis JTS: Tools for Implementing Domain-Specific Languages .5th International Conference on Software Reuse (ICSR), Victoria, Canada, June 1998

The Jakarta Tool Suite (JTS) aims to reduce substantially the cost of generator development by providing domain-independent tools for creating domain-specific languages and component-based generators called GenVoca generators. JTS is a set of precompiler-compiler tools for extending industrial programming languages (e.g., Java) with domain-specific constructs. JTS is itself a GenVoca generator, where precompilers for JTS-extended languages are constructed from components

mixin layers
Y. Smaragdakis and D. Batory. . Implementing Reusable Object-Oriented Components. 5th International Conference on Software Reuse (ICSR), Victoria, Canada, June 1998

Object-oriented (OO) classes are generally not reusable because they are not meaningful in isolation; most classes only have meaning as members of cooperating suites of classes (e.g., design patterns). These suites usually arise in designs, but rarely exist as encapsulated entities in OO implementations. In this paper we present a method for directly mapping cooperating suites of classes into encapsulated C++ implementations. Our method is an improvement over the VanHilst and Notkin approach for implementing collaboration-based designs and constitutes a step towards more reusable (object-oriented) components.

(If you are thinking of using the C++ technique described in this paper, read some implementation advice: Y. Smaragdakis, >Practical Advice on Using Mixin Layers (and Mixins) in C++

mixin layers
D. Batory, G. Chen, E. Robertson, and T. Wang. . Web-Advertised Generators and Design Wizards. 5th International Conference on Software Reuse (ICSR), Victoria, Canada, June 1998

Domain-specific generators will increasingly rely on Web-based applets for declarative specifications of target applications. Applets will communicate with generators via servers to produce customized code on demand. Critical to the success of this approach will be domain-specific design wizards, tools that guide users in their selection of components for constructing particular applications. In this paper, we present the P3 ContainerStore applet, its generator, and design wizard

P3 optimization, wizards
Y. Smaragdakis and D. Batory. DiSTiL: A Transformation Library for Data Structures. USENIX Conference on Domain-Specific Languages, October 1997

DiSTiL is a representative of a new approach to domain-specific language implementation. Instead of being the usual one-of-a-kind stand-alone compiler, DiSTiL is an extension library for the Intentional Programming (IP) transformation system (currently under development by Microsoft Research). DiSTiL relies on several reusable, general purpose infrastructure tools offered by IP that substantially simplify DSL implementation.
P3
D. Batory, D. Miranker, and D. Brant. Jakarta: A Tool Suite for Constructing Software Generators

Jakarta is a set of compiler tools to build extensible languages and to build compilers through component composition. Jakarta is being written in a Jakarta-produced superset of the Java language. We describe briefly how Jakarta will be used to build tools to automate OO design patterns, and the design wizard tool, which captures domain-specific knowledge of software architectures using domain-specific algebras
G. Jimenez-Perez and D. Batory. Memory Simulators and Software Generators, 1997 Symposium on Software Reuse (SSR), 136-145

We present results on re-engineering a highly tuned and hand-coded memory simulator using the P2 container data structure generator. This application was chosen because synthesizing the simulator's data structures would not exploit P2's primary advantage of automatically applying sophisticated code optimization techniques. Thus we initially believed that using P2 would be an overkill and that P2's generated code would provide no performance advantages over hand coding. On the contrary, we found that P2 produced more efficient code and that it offered significant advantages to software development that we had not previously realized.
P3
D. Batory and B. J. Geraci. Composition Validation and Subjectivity in GenVoca Generators, IEEE Transactions on Software Engineering (IEEE TSE) (special issue on Software Reuse), February 1997, 67-82. This is a journal version of papers Validating Component Compositions in Software System Generators. and Subjectivity and GenVoca Generators..

GenVoca generators synthesize software systems by composing components from reuse libraries. GenVoca components are designed to export and import standardized interfaces, and thus be plug-compatible, interchangeable, and interoperable with other components. In this paper, we examine two different but important issues in software system synthesis. First, not all syntactically correct compositions of components are semantically correct. We present simple, efficient, and domain-independent algorithms for validating compositions of GenVoca components. Second, components that export and import immutable interfaces are too restrictive for software system synthesis. We show that the interfaces and bodies of GenVoca components are subjective, i.e., they mutate and enlarge upon instantiation. This mutability enables software systems with customized interfaces to be composed from components with "standardized" interfaces

constraints, safe composition, award, design rules, DRC, recommended design rule checking
D. Batory Intelligent Components and Software Generators . Keynote at Software Quality Institute Symposium on Software Reliability, Austin, Texas, April 1997

The production of well-understood software will eventually be the responsibility of software generators. Generators will enable high-performance, customized software systems and subsystems to be assembled quickly and cheaply from component libraries. These components will be intelligent: they will encapsulate domain-specific knowledge (e.g., 'best practice' approaches) so that their instances will automatically customize and optimize themselves to the system in which they are being used. In this paper, we explore the topics intelligent components and software generation as they pertain to the issues of software productivity, performance, reliability, and quality.

keynote
E.E. Villarreal and D. Batory. Rosetta: A Generator of Data Language Compilers,. 1997 Symposium on Software Reuse (SSR), 146-156

A data language is a declarative language that enables database users to access and manipulate data. There are families of related data languages; each family member is targeted for a particular application. Unfortunately, building compilers for such languages is largely an ad hoc process; there are no tools and design methods that allow programmers to leverage the design and code of compilers for similar languages, or to simplify the evolution of existing languages to include more features. Rosetta is a generator of relational data language compilers that demonstrates practical solutions to these problems. We explain how domain analysis identifies primitive building blocks of these compilers, and how grammar-based definitions (e.g. GenVoca) of the legal compositions of these blocks yields compact and easily-evolvable specifications of data languages. Rosetta automatically transforms such specifications into compilers. Experiences with Rosetta are discussed.

V.P. Singhal. A Programming Language for Writing Domain-Specific Software System Generators . Ph.D. Dissertation. Department of Computer Sciences, University of Texas at Austin, September 1996.

Automating routine programming tasks is an effective way to increase the productivity of software development. Software system generators have the potential to achieve this goal: customized software systems can be quickly and easily assembled from component libraries. Our research demonstrates that for generators to be successful, component libraries must be scalable. Scalability enables libraries to be small, because the components of the library implement distinct and largely orthogonal features. These components can be combined to yield an enormous family of software systems and subsystems. Generators thus become tools for combining components to manufacture these systems and subsystems. In GenVoca, the programming model that forms the foundation of our research, components act as large-scale refinements which simultaneously transform multiple classes from one abstraction to another. Because GenVoca advocates a novel style of program organization, there is little language or tool support for this paradigm. We have developed a programming language called P++, which extends C++ with specialized constructs to support the GenVoca model. It permits components to be treated as transformations which can simultaneously refine several classes in a consistent manner. To validate the utility of this language, we solved a challenge problem in software reuse: we reimplemented the Booch C++ Components data structures library as a scalable P++ library. We were able to reduce the volume of code and number of components by approximately a factor of four, without compromising the performance of generated systems.

student
. Dinesh Das and D. Batory. Synthesizing Rule Sets for Query Optimizers from Components, . Technical Report TR-96-05, Department of Computer Sciences, University of Texas at Austin, April 1996

Query optimizers are complex subsystems of database management systems. Modifying query optimizers to admit new algorithms or storage structures is quite difficult, but partly alleviated by extensible approaches to optimizer construction. Rule-based optimizers are a step in that direction, but from our experience, the rule sets of such optimizers are rather monolithic and brittle. Conceptually minor changes often require wholesale modifications to a rule set. Consequently, much can be done to improve the extensibility of rule-based optimizers. As a remedy, we present a tool called Prairie that is based on an algebra of layered optimizers. This algebra naturally leads to a building-blocks approach to rule-set construction. Defining customized rule sets and evolving previously defined rule sets is accomplished by composing building-blocks. We explain an implementation of Prairie and present experimental results that show how classical relational optimizers can be synthesized from building-blocks, and that the efficiency of query optimization is not sacrificed
D. Batory. Software System Generators, Transformation Systems,and Compilers. . Working Paper, October 1995

GenVoca generators assemble customized, high-performance software systems automatically from components. In this paper, we explain how GenVoca generators are actually compilers for domain-specific module interconnection languages and that the underlying compilation technology is a special class of transformation systems
D. Batory. Software Component Technologies and Space Applications. International Conference on Integrated Micro-Nano Technology for Space Applications, November 1995

In the near future, software systems will be as reconfigurable than hardware. This will be possible through the advent of software component technologies, which have been prototyped in universities and research labs. In this paper, we outline the foundations for these technologies and suggest how they might impact software for space applications
. L. Tokuda. Program Transformations for Evolving Software Architectures. OOPSLA'95 position paper for workshop on Adaptable and Adaptive Software, 1995

Software evolution is often driven by the need to extend existing software. "Design patterns" express preferred ways to extend object-oriented software and provide desirable target states for software designs. This paper demonstrates that some design patterns can be expressed as a series of parameterized program transformations applied to a plausible initial software state. A software tool is proposed that uses primitive transformations to allow users to evolve object-oriented applications by visually altering design diagrams
D. Batory. Subjectivity and GenVoca Generators.. International Conference on Software Reuse (ICSR) (Orlando), 1996. See IEEE TSE journal version. Expanded Technical Report TR-95-32, Department of Computer Sciences, University of Texas at Austin, June 1995

The tenet of subjectivity is that no single interface can adequately describe any object; interfaces to the same object will vary among different applications. Thus, objects with standardized interfaces seem too brittle a concept to meet the demands of a wide variety of applications. Yet, objects with standardized interfaces is a central idea in domain modeling and software generation. Standard interfaces make objects plug-compatible and interchangeable, and it is this feature that is exploited by generators to synthesize high-performance, domain-specific software systems. Interestingly, generated systems have customized interfaces that can be quite different from the interfaces of their constituent objects.
In this paper, we reconcile this apparent contradiction by showing that the objects (components) in the GenVoca model of software generation are not typical software modules; their interfaces and bodies mutate upon instantiation to a "standard" that is application-dependent
. D. Batory. Issues in Domain Modeling and Software System Generation. OOPSLA'95 position paper for Panel on Objects and Domain Engineering, 1995
. D. Batory and J. Thomas. P2: A Lightweight DBMS Generator. Journal of Intelligent Information Systems. Also, Technical Report TR-95-26, Department of Computer Sciences, University of Texas at Austin, June 1995

A lightweight database system (LWDB) is a high-performance, application-specific DBMS. It differs from a general-purpose (heavyweight) DBMS in that it omits one or more features and specializes the implementation of its features to maximize performance. Although heavyweight monolithic and extensible DBMSs might be able to emulate LWDB capabilities, they cannot match LWDB performance. In this paper, we describe P2, a generator of lightweight DBMSs, and explain how it was used to reengineer a hand-coded, highly-tuned LWDB used in a production system compiler (LEAPS). We present results that show P2-generated LWDBs reduced the development time and code size of LEAPS by a factor of three and that the generated LWDBs executed substantially faster than versions built by hand or using an extensible heavy weight DBMS.
P3
D. Das. Making Database Optimizers More Extensible . Ph.D. Dissertation. Department of Computer Sciences, University of Texas at Austin, May 1995.

Query optimizers are fundamental components of database management systems (DBMSs). An optimizer consists of three features: a search space, a cost model, and a search strategy. The experience of many researchers has shown that hard-wiring these features results in an optimizer that is very inflexible and difficult to modify. Rule-based optimizers have been developed to alleviate some of the problems of monolithic optimizers. Unfortunately, contemporary rule-based optimizers do not provide enough support to enable database implementers (DBI) to fully realize the potential of open systems. We have identified four requirements that a rule-based optimizer should satisfy to address these needs. First, rules should be specified using high-level abstractions, insulating the DBI from underlying implementation details. Second, rule sets should be easily extensible, with a minimum of reprogramming required. Third, rule sets should be easily reconfigurable, that is, changeable to meet a variety of user needs, interfaces, database schemes, etc. Fourth, rule-based optimizers should be fast, that is, performance should not be sacrificed for the sake of high-level specifications. In this dissertation, we describe Prairie, an environment for specifying rules for rule-based optimizers that satisfies all four of the above requirements. The Prairie specification language is presented and we show how it allows a DBI to design an easily extensible rule set for a rule-based optimizer. Experimental results are presented using the Texas Instruments Open ODD optimizer rule set to validate the claim of good performance using Prairie. Finally, a building blocks approach of constructing rule sets is presented; this results in easily reconfigurable rule sets whose features are changeable simply by assembling the blocks in various ways.

student
D. Batory, Lou Coglianese, M. Goodwin, and S. Shafer. Creating Reference Architectures: An Example from Avionics.. Symposium on Software Reusability (SSR), Seattle Washington, April 1995

ADAGE is a project to define and build a domain-specific software architecture (DSSA) environment for assisting the development of avionics software. A central concept of DSSA is the use of software system generators to implement component-based models of software synthesis in the target domain. In this paper, we present the ADAGE component-based model (or reference architecture) for avionics software synthesis. We explain the modeling procedures used, review our initial goals, and examine what we were (and were not) able to accomplish. The contributions of our paper are the lessons that we learned; they may be beneficial to others in future modeling efforts

award
. D. Batory, Lou Coglianese, S. Shafer, and W. Tracz. The ADAGE Avionics Reference Architecture. AIAA Computing in Aerospace-10 Conference, San Antonio, March 1995

ADAGE is a project to define and build a domain-specific software architecture (DSSA) environment for avionics. A central concept of ADAGE is the use of generators to implement scalable, component-based models of avionics software. In this paper, we review the ADAGE model (or reference architecture) of avionics software and describe techniques for avionics software synthesis
. Dinesh Das and D. Batory. Prairie: A Rule Specification Framework for Query Optimizers. International Conference on Data Engineering (Taipei), March 1995

From our experience, current rule-based query optimizers do not provide a very intuitive and well-defined framework to define rules and actions. To remedy this situation, we propose an extensible and structured algebraic framework called Prairie for specifying rules. Prairie facilitates rule-writing by enabling a user to write rules and actions more quickly, correctly and in an easy-to-understand and easy-to-debug manner.
Query optimizers consist of three major parts: a search space, a cost model and a search strategy. The approach we take is only to develop the algebra which defines the search space and the cost model and use the Volcano optimizer-generator as our search engine. Using Prairie as a front-end, we translate Prairie rules to Volcano to validate our claim that Prairie makes it easier to write rules.
We describe our algebra and present experimental results which show that using a high-level framework like Prairie to design large-scale optimizers does not sacrifice efficiency
. D. Batory, D. McAllester, L. Coglianese, and W. Tracz. Domain Modeling in Engineering of Computer-Based Systems. 1995 International Symposium and Workshop on Systems Engineering of Computer Based Systems, Tucson, Arizona, February 1995

Domain modeling is believed to be a key factor in developing an economical and scalable means for constructing families of related software systems. In this paper, we review the current state of domain modeling, and present some of our work on the ADAGE project, an integrated environment that relies heavily on domain models for generating real-time avionics applications. Specifically, we explain how we detect errors in the design of avionics systems that are expressed in terms of compositions of components. We also offer insights on how domain modeling can benefit the engineering of computer-based systems in other domains
. L. Tokuda and D. Batory. Automated Software Evolution via Design Pattern Transformations. 3rd International Symposium on Applied Corporate Computing, Monterrey, Mexico, October 1995. Also TR-95-06, Department of Computer Sciences, University of Texas at Austin, February 1995

Software evolution is often driven by the need to extend existing software. "Design patterns" express preferred ways to extend object-oriented software and provide desirable target states for software designs. This paper demonstrates that some design patterns can be expressed as a series of parameterized program transformations applied to a plausible initial software state. A software tool is proposed that uses primitive transformations to allow users to evolve object-oriented applications by visually altering design diagrams

refactoring
J. Thomas and D. Batory. P2: An extensible Lightweight DBMS.. Technical Report TR-95-04, Department of Computer Sciences, University of Texas at Austin, February 1995

A lightweight database system (LWDB) is a high-performance, application-specific DBMS. It differs from a general-purpose (heavyweight) DBMS in that it omits one or more features and specializes the implementation of its features to maximize performance. Although heavyweight monolithic and extensible DBMSs might be able to emulate LWDB capabilities, they cannot match LWDB performance.
In this paper, we explore LWDB applications, systems, and implementation techniques. We describe P2, an extensible lightweight DBMS, and explain how it was used to reengineer a hand-coded, highly-tuned LWDB used in a production system compiler (LEAPS). We present results that show P2-generated LWDBs for LEAPS executes substantially faster than versions built by hand or that use an extensible heavyweight DBMS.
P3
D. Batory and B.J. Geraci. Validating Component Compositions in Software System Generators, International Conference on Software Reuse (ICSR) (Orlando), 1996. See IEEE TSE journal version. Also, Expanded Technical Report TR-95-03, Department of Computer Sciences, University of Texas at Austin, June 1995

Generators synthesize software systems by composing components from reuse libraries. In general, not all syntactically correct compositions are semantically correct. In this paper, we present domain-independent algorithms for the GenVoca model of software generators to validate component compositions. Our work relies on attribute grammars and offers powerful debugging capabilities with explanation-based error reporting. We illustrate our approach by showing how compositions are debugged by a GenVoca generator for container data structures

constraints, safe composition, DRC, design rules, design rule checking
E.E. Villarreal. Automated Compiler Generation for Extensible Data Languages . Ph.D. Dissertation. Department of Computer Sciences, University of Texas at Austin, December 1994.

To meet the changing needs of the DBMS community, e.g., to support new database applications such as geographic or temporal databases, new data languages are frequently proposed. Most offer extensions to previously defined languages such as SQL or Quel. Few are ever implemented. The maturity of the area of data languages demands that researchers go beyond the proposal stage to have hands-on experience with their languages, if only to separate the good ideas from the bad. Tools and methodologies for building families of similar languages are definitely needed; we solve this problem by automating the generation of compilers for data languages. Our work, Rosetta, is based on two concepts. First, underlying the domain of data languages is a common backplane of relational operations. Backplane operations are primitive building blocks for language execution and construction, where a building block has standardized semantics. The definition of a well-designed backplane is implementation-independent; that is, the backplane is defined once but can be used to model arbitrarily many data languages. Second, there exist primitive building-blocks for language construction. From our analysis of the database data language domain, we have identified three classes of building-blocks: one class maps language syntax to backplane functions, another builds an internal representation of the backplane operator tree, and a third class manages contextual information. For modeling data languages, we define the Rosetta specification language, a grammar-based specification language tailored to our needs with the power to define syntax, map it to the target language, and build an operator tree all in one rule. Thus each rule is a microcosmic model of a language clause which encapsulates input parsing and code generation. Our specification language models data languages based on the composition of primitive building blocks for semantics and the customization of the syntax for invoking the compositions. A compiler for a data language is generated by first modeling the language and then compiling the specification. The ease and efficiency with which Rosetta customizes languages derives from the reuse of the backplane operations and the high-level specification supported.

student
. D. Batory, J. Thomas, and M. Sirkin. Reengineering a Complex Application Using a Scalable Data Structure Compiler. ACM SIGSOFT FSE 1994 (New Orleans), December 1994

P2 is a scalable compiler for collection data structures. High-level abstractions insulate P2 users from data structure implementation details. By specifying a target data structure as a composition of components from a reuse library, the P2 compiler replaces abstract operations with their concrete implementations.
LEAPS is a production system compiler that produces the fastest sequential executables of OPS5 rule sets. LEAPS is a hand-written, highly-tuned, performance-driven application that relies on complex data structures. Reengineering LEAPS using P2 was an acid test to evaluate P2's scalability, productivity benefits, and generated code performance.
In this paper, we present some of our experimental results and experiences in this reengineering exercise. We show that P2 scaled to this complex application, substantially increased productivity, and provided unexpected performance gains.
P3
. D. Batory. The LEAPS Algorithms. Technical Report TR-94-28, Department of Computer Sciences, University of Texas at Austin, November 1994

LEAPS is a state of the art production system compiler that produces the fastest sequential executable OPS5 rule sets. The performance of LEAPS is due to its reliance on complex data structures and search algorithms to speed rule processing. In this paper, we explain the LEAPS algorithms in terms of the programming abstractions of the P2 data structure compiler
. D. Batory, B. Geraci, and J. Thomas. Introductory P2 System Manual.. Technical Report TR-94-26, Department of Computer Sciences, University of Texas at Austin, November 1994

P2 is a prototype container data structure precompiler. It is a superset of the C language offering container and cursor abstractions as part of the linguistic extensions to C. P2 is based on the GenVoca model of software system generators. This document is the users manual for programming in the P2 language.
P3
. D. Batory, B. Geraci, and J. Thomas. Advanced P2 System Manual. Technical Report TR-94-27, Department of Computer Sciences, University of Texas at Austin, November 1994

This manual documents how layers are written in P2. There is a special language, XP, which was designed specifically for defining P2 building blocks (i.e., primitive data structure layers).
P3
. D. Batory, V. Singhal, J. Thomas, S. Dasari, B. Geraci, and M. Sirkin. The GenVoca Model of Software-System Generators. IEEE Software, September 1994

An emerging breed of generators synthesize complex software systems from libraries of reusable components. These generators, called GenVoca generators, produce high-performance software and offer substantial increases in productivity.
P3
M.J. Sirkin. A Software System Generator for Data Structures . Ph.D. Dissertation. Department of Computer Science and Engineering, University of Washington, March 1994.

Although data structures are a fundamental part of most applications, using and writing data structures is time-consuming, difficult, and error-prone. Programmers often select inappropriate data structures for their applications because they do not know which data structure to use, they do not know how to implement a particular data structure, or they do not have an existing implementation of the data structure to use. This dissertation describes a model and a technology for overcoming these problems. Our approach is based on non-traditional parameterized types (NPTs). NPTs are an extension to traditional parameterized types (TPTs), which are already familiar to most programmers. Our NPTs are based on the GenVoca domain modeling concepts, vertical parameterization, a consistent high-level interface, and a transformational compiler. Our research has led to the construction of a software system generator for data structures called Predator. Predator is able to transform data structure declarations and data structure-independent functions into efficient code. Predator also allows programmers to adjust a data structure's implementation by simply changing its declaration and recompiling. This dissertation discusses our model (and how it differs from standard models), our Predator compiler, and the results of our validation efforts.

student P3
. D. Batory. Products of Domain Models. ARPA Domain Modeling Workshop, George Mason University, September 1994

We argue that domain models should produce four basic products: identification of reusable software components, definition of software architectures that explain how components can be composed, a demonstration of architecture scalability, and a direct relationship of these results to software generation of target systems
. D. Batory, V. Singhal, J. Thomas, and M. Sirkin. Scalable Software Libraries. ACM SIGSOFT FSE 1993 (Los Angeles), December 1993

Many software libraries (e.g., the Booch C++ Components, libg++, NIHCL, COOL) provide components (classes) that implement data structures. Each component is written by hand and represents a unique combination of features (e.g. concurrency, data structure, memory allocation algorithms) that distinguishes it from other components. We argue that this way of building data structure component libraries is inherently unscalable. Libraries should not enumerate complex components with numerous features; rather, libraries should take a minimalist approach: they should provide only primitive building blocks and be accompanied by generators that can combine these blocks to yield complex custom data structures. In this paper, we describe a prototype data structure generator and the building blocks that populate its library. We also present preliminary experimental results which suggest that this approach does not compromise programmer productivity nor the run-time performance of generated data structures.
P3
. V. Singhal and D. Batory. P++: A Language for Software System Generators.. Technical Report TR-93-16, Department of Computer Sciences, University of Texas at Austin, November 1993

P++ is a programming language that supports the GenVoca model, a particular style of software design that is intended for building software system generators. P++ is an enhanced version of C++: it offers linguistic extensions for component encapsulation, abstraction, parameterization, and inheritance, where a component is a suite of interrelated classes and functions. This paper describes the motivations for P++, the ideas which underlie its design, the syntax and features of the language, and related areas of research
J. Thomas, D. Batory, V. Singhal, and M. Sirkin. A Scalable Approach to Software Libraries. 6th Annual Workshop on Software Reuse (Owego, New York), November 1993

Software libraries offer a convenient and accessible means to achieve the benefits of reuse. The components of these libraries are written by hand, and each represents a unique combination of features that distinguishes it from other components. Unfortunately, as the number of features grows, the size of these libraries grows exponentially, making them unscalable.
Predator is a research project to develop abstractions and tools to provide the benefits of software libraries without incurring the scalability disadvantages just mentioned. Our approach relies on a careful analysis of an application domain to arrive at appropriate high-level abstractions, standardized (i.e., plug-compatible) interfaces, and layered decompositions. Predator defines language extensions for implementing components, and compilers to automatically convert component compositions into efficient programs.
P3
V. Singhal and D. Batory. P++: A Language for Large-Scale Reusable Software Components. 6th Annual Workshop on Software Reuse (Owego, New York), November 1993

P++ is a programming language that supports the GenVoca model, a particular style of software design that is intended for building software system generators. P++ is an enhanced version of C++: it offers linguistic extensions for component encapsulation, abstraction, parameterization, and inheritance, where a component is a subsystem, i.e., a suite of interrelated classes and functions

award
. M. Sirkin. Predator: A Data Structure Compiler.. A manual describing the features and syntax of P1, a prototype data structure compiler (unpublished)
M. Sirkin, D. Batory, and V. Singhal. Software Components in a Data Structure Precompiler. International Conference on Software Engineering (ICSE) (Baltimore, MD), May 1993, pages 437-446

PREDATOR is a data structure precompiler. that generates efficient code for maintaining and querying complex data structures. It embodies a novel component reuse technology that transcends traditional generic data types. In this paper, we explain the concepts of our work and our prototype system. We show how complex data structures can be specified as compositions of software building blocks, and present performance results that compare PREDATOR output to hand-optimized programs
D. Batory, V. Singhal, and J. Thomas. Database Challenge: Single Schema Database Management Systems. Technical Report TR-92-47, Department of Computer Sciences, University of Texas at Austin, December 1992

Many data-intensive applications require high-performance data management facilities, but utilize only a small fraction of the power of a general-purpose database system (DBMS). We believe single schema database systems (SSTs), i.e., special-purpose DBMSs that are designed for a single schema and a predeclared set of database operations, are a vital need of today's software industry. The challenge is to create a technology for economically building high-performance SSTs. SST research will combine results from object-oriented databases, persistent object stores, module interconnection languages, rule-based optimizers, open-architecture systems, extensible databases, and generic data types.
P3
. D. Batory and S. O'Malley. The Design and Implementation of Hierarchical Software Systems with Reusable Components. ACM Transactions on Software Engineering and Methodology (TOSEM), 1(4):355-398, October 1992

We present a domain-independent model of hierarchical software system design and construction that is based on interchangeable software components and large-scale reuse. The model unifies the conceptualizations of two independent projects, Genesis and Avoca, that are successful examples of software component/building-block technologies and domain modeling. Building-block technologies exploit large-scale reuse, rely on open architecture software, and elevate the granularity of programming to the subsystem level. Domain modeling formalizes the similarities and differences among systems of a domain. We believe our model is a blue-print for achieving software component technologies in many domains

GenVoca
D. Batory, V. Singhal, and M. Sirkin. Implementing a Domain Model for Data Structures. International Journal of Software Engineering and Knowledge Engineering, 2(3):375-402, September 1992

We present a model of the data structure domain that is expressed in terms of the GenVoca domain modeling concepts. We show how familiar data structures can be encapsulated as realms of plug-compatible, symmetric, and reusable components, and we show how complex data structures can be formed from their composition. The target application of our research is a precompiler for specifying and generating customized data structures
D.S. Batory. The Genesis Papers: Volume 1 . UTCS Technical Report, June 1991.

This is a collection of major publications on the Genesis extensible DBMS project. There are six papers overall. Three explain the Genesis open-system architecture concept, two papers discuss its implementation, and the last paper is the Genesis Unix User's Manual.

feature modules, layered designs, GenVoca, DRC, design rules, design rule checking

Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.