Generation Scoping

Generation Scoping is a variation of hygienic macros. It is an essential feature of metaprogramming (i.e., the ability to compose independently-written code fragments and avoiding inadvertent capture). This section presents the capabilities of the AHEAD generation scoping component (gscope) that is used with the AHEAD AST component:

Inadvertent Capture and Hygienic Macros
Environments
Environments in Interfaces
Environment Hierarchies
Aliasing
Environment Augments

A more general form of generation scoping (which has been added to Microsoft's IP project) is discussed in:

Yannis Smaragdakis and Don Batory, Scoping Constructs for Program Generators, First Symposium on Generative and Component-Based Software Engineering, Erfurt, Germany, September 1999.

Inadvertent Capture and Hygienic Macros

Inadvertent capture arises when two independently written and correct code fragments are composed, but their composition is incorrect because of ambiguities in the naming of variable identifiers. Consider the following macro (parameterized code fragment) which defines a variable temp:

macro(x) { int temp = 2*x; ... }

The following application defines a variable, also called temp, and invokes macro(x):

int temp = 7;
macro(temp);

When macro(x) is expanded, incorrect code is produced:

int temp = 7;
{ int temp = temp;  ... }   // incorrect

Instead of the inner temp variable being initialized with value of the outer temp variable; the uninitialized inner temp variable is used to initialize itself! This is an example of inadvertent capture: the temp identifiers are insufficient to disambiguate the variables that they reference.

Hygienic, lexically-scoped macros (HLSM) were designed to solve the inadvertent capture problem. HLSM relies on a painting algorithm that ensures identifiers are bound to the correct variables. Often, HLSM is implemented as a preprocessing step that mangles variable names to ensure their uniqueness:

int temp_0 = 7;
{ int temp_1 = 2*temp_0;  ... }   // correct

HLSM's applicability is limited to macros (i.e., pattern-based source code transformations). Since AHEAD supports programmatic (as opposed to pattern-based) AST construction, AHEAD uses an adaptation and generalization of HLSM called generation scoping (GS). In the following sections, we review the features of GS.

Environments

A GS environment is a list of identifiers (i.e., class or interface names, data member or method names, etc.) that are local to a set of related code fragments that are to be generated. To ensure the absence of inadvertent capture, identifiers that are local to these code fragments are mangled. Associated with each environment instance is a unique mangle number, an integer that is attached to an identifier to make it unique. For example, if an environment's mangle number is 005 and identifier i is to be mangled, identifier i_005 is produced.

Environments are associated with classes; environment instances are associated with objects. Class foo below defines an environment with identifiers i and j. Each foo instance creates an environment with identifiers i and j. Different foo instances represent distinct environment instances. Whenever a tree constructor is evaluated by a foo object, it does so in the context of that object's environment. Thus, if x and y are distinct foo instances, and x.bar() and y.bar() return code fragments, the returned fragments will be isomorphic in structure, but will have different names for variables i and j.

class foo {
    environment i, j; // identifiers to mangle
    AST_Exp bar() { return exp{ i + j }exp; }
}

foo x = new foo();   // assume mangle number is 000
foo y = new foo();   // assume mangle number is 001

x.bar();             // produces AST "i_000 + j_000"
y.bar();             // produces AST "i_001 + j_001"

With the above capabilities, the problem of inadvertent capture presented earlier can be avoided. One defines a class (macroExample) with an environment that contains the temp identifier. A method of this class (macroCode) uses a tree constructor to manufacture the body of the "macro". The temp variable that is defined internally to that code fragment is given a unique name via mangling, so inadvertent capture cannot arise.

class macroExample {
    environment temp;
    AST_Stmt macroCode( AST_QualifiedName n ) {
        return stm{ int temp = 2*$id(n); ... }stm;
    }
}
Programming note: An environment statement is a declaration, much like a data member and method declaration. In fact, when the environment statement is reduced to Java, a variable declaration and method replace the environment statement. The variable and method are non-static, thus, environments and their code constructors cannot be referenced within static methods.

Environments in Interfaces

Objects with environments have hidden data members and methods that implement the environment construct. Interfaces to such classes must expose these hidden methods. To accomplish this, an empty environment declaration can appear inside an interface declaration. At reduction time, this declaration will be replaced by the hidden method(s) used by environment implementations. An example is shown below:

interface fooInt {
    environment;         // empty - it declares that fooInt
    ...                  // classes have environment declarations
}

class foo implements fooInt {
    environment ...;     
}

// application code
fooInt x = new foo();

Environment Hierarchies

Environment instances can be organized hierarchically to emulate scopes in the name space of generated programs. As expected, identifiers of parent environments are visible in child environments and identifiers that are declared in a child environment hide identifiers in parent environments with the same name. Parent linkages among environments are made at application run-time using the environment parent declaration. The example below shows that instances of class baz make their environments children of environments of foo objects. Note that a tree constructor for the expression i+k produced by a baz object yields i_000+k_002 because identifier i is mangled by the foo environment while k is mangled by the baz environment:

class baz {
    environment k;

    baz( foo z ) {
        environment parent z;   // z is the parent environment
                                // of the constructed object
    }

    AST_Exp biff() { return exp{ i + k }exp; }
}

baz r = new baz(x);     // recall x has mangle number 000
                        // assume mangle number for r is 002
r.biff();               // produces AST "i_000 + k_002"

As a more realistic example, consider the following innocent-looking statement taken from the domain of data structures:

cursor.second = container.first.next;

The second field of a cursor object is to be initialized with a pointer to the second element of a container. This pointer is obtained by following the first pointer (stored in a container object) and the next pointer (of an element object). Note the following: second is a field of a cursor class/object; first is a field of a container class/object; next is a field of an element class/object. Each of these fields is defined in a separate scope. The need for environments arises because a container object can have many first fields, a cursor object can have many second fields, an element object can have many next fields. It is important that the names of these fields be distinguished, and that consistent names are used for second, first, and next fields when the above statement is generated.

A technique that we have used in the development of the AHEAD P3 data structure generator is to define a class for each P3 layer. Within a layer class is an environment declaration that lists the the union of the set of names of all local identifiers added to the cursor, container, and element classes that this layer refines. Separate classes are then defined to generate the code for each cursor, container, and element class. Instances of these cursor, container, and element classes have environments that are children of their corresponding layer object. Thus, any ASTs that are generated within the cursor, container, and element classes will have their local identifiers mangled consistently.

class layer {
    environment i, j;   // list of local identifiers 
                        // added by this layer

    cursor make_cursor_layer() { 
        return new cursor_layer(this);
    }

    container make_container_layer() {
        return new container_layer(this);
    }

    element make_element_layer() { 
        return new element_layer(this);
    }
}

class cursor_layer {
    environment;        // empty, as all names of local 
                        // cursor variables are added
                        // to the layer's environment list

    cursor_layer( layer l ) {
        environment parent l;   // environment of l is parent 
    }                           // of constructed object

    AST_Exp foo() { return exp{ i + j }exp; }
}

class container_layer {
    environment;        // empty

    container_layer( layer l ) {
        environment parent l;   // environment of l is parent
    }                           // of constructed object

    AST_Exp foo() { return exp{ i + j }exp; }
}

class element_layer {
    environment;        // empty

    container_element( layer l ) {
        environment parent l;   // environment of l is parent
    }                           // of constructed object

    AST_Exp foo() { return exp{ i + j }exp; }
}

// application code.  objects k, c, e are used to 
// generate container, cursor, and element code

l = new layer();                // assume mangle number 007
k = l.make_container_layer();
c = l.make_cursor_layer();
e = l.make_element_layer();

k.foo();                // returns "i_007 + j_007"
c.foo();                // returns "i_007 + j_007"
e.foo();                // returns "i_007 + j_007"

Note that it is possible to create directed acyclic graphs of parent relationships among environments (e.g., an environment that has multiple parent environments). Such connections can be useful, but programmers should be careful. If an identifier i is defined in multiple parents, where there is no parent-child relationship between these two parent environments, then there is no guarantee that identifier i will be mangled correctly or consistently.

Aliasing

Escapes define a mechanism for explicit parameterization of ASTs. Aliasing provides a less powerful, but none-the-less useful form of implicit parameterization. The idea is to bind a code fragment to a local identifier. Instead of mangling the identifier, it is replaced with its aliased code fragment.

Recall the macro example that illustrated how inadvertent capture could be avoided. We can recast its implementation using aliases:

class macroExample {
    environment temp, x;

    AST_Stmt macroCode( AST_QualifiedName n ) {
        alias( x, n );
        return stm{ int temp = 2*x; ... }stm;
    }
}

The general form of an alias declaration is:

alias( identifier, AST );

Aliasing is generally useful in building generators for domain-specific languages. Common parameters to code fragments can be aliased identifiers, rather than having explicit escapes. In this way, code fragments can be much more readable. For example, when we built the P3 generator, we frequently made reference to a container object. This reference expanded into possibly different code fragments, depending on how the container was implemented. Rather than having explicit escapes (i.e., $exp(ContainerRef)) littered throughout our code fragments, we created an aliased identifier (ContainerRef) and used it instead. This made our code fragments easier to write and read.

There are different types of ASTs. Unfortunately, only a subset of these types can be used with aliases. The above showed examples of ASTs of type AST_QualifiedName. The complete list of AST types can be used with aliases is:

Allowable AST Types for Aliasing	Meaning
AST_QualifiedName	Java qualified names (e.g., "x.y.z")
AST_Exp	expressions
AST_Stmt	list of statements
AST_FieldDecl	list of data members and methods
AST_Plst	list of function arguments
AST_Class	list of class and interface definitions

The reason why other AST types cannot be used is due to the limited places that identifiers can appear in the Java grammar without introducing ambiguities. No ambiguities are introduced for the above types.

Environment Augments

Environments statically declare the set of identifiers that are local to a set of related code fragments. Occasionally, it is necessary for an application generator to manufacture local identifiers at run-time. To add identifiers dynamically to an environment, the "environment augment" construct is used. It takes a single parameter (of type String), and adds this string to the list of identifiers of the current environment. The example below illustrates the concept.

class EnvAl {
    environment i;

    AST_Exp foo() { return exp{ i + j }exp; }

    void Add2Env( String str ) {
        environment augment str;
    }
}

// application code

EnvAl x = new EnvAl();  // assume mangle number 008

x.foo();                // outputs "i_008 + j"
x.Add2Env("j");         // adds j to environment of x
x.foo();                // outputs "i_008 + j_008"

AHEAD Home Page