Internal Representation of AHEAD Abstract Syntax Trees/Parse Trees

This section presents the low-level, programmatic capabilities AHEAD offers for creating and manipulating abstract-syntax trees (ASTs) of Java and extended-Java programs.  While the discussion centers on Java dialects, the basic technology applies to all AHEAD-produced languages.

  • AST Node Hierarchy
  • Structure of an AST Node
  • Relationship of AST Nodes to Bali Productions
  • How is an AST printed?
  • Print Information Stored in AST Nodes
  • AST Printing Idioms
  • Normalizing Trees
  • Detaching Trees
  • Methods on AST Nodes
  • AST Node Hierarchy

    wpe3.gif (2264 bytes)AHEAD relies on a hierarchy of different kinds of AST nodes.  The kernel hierarchy is shown in the figure below, where AstNode is the root of the AST class hierarchy; all tokens are instances of AstToken.

    Note that all Bali-generated classes are subclasses of AstNode.

    Structure of an AST Node

    wpe5.gif (1971 bytes)An AstNode has an array (arg) of zero of more children (of type AstNode), a left and right to connect a node with its siblings, and an up pointer to its parent. Not depicted is an array (tok) of zero or more tokens.  The up pointer is null if an AST node has no parent; left is null if there is no left sibling, and similarly for right.  A terminal node of an AST has (obviously) zero children; the arg array is null.

    In general, programmers need not know about these pointers, although it may be useful information.  The code below sketches the definition of the AstNode class (which is defined in the kernel layer).

    static public abstract class AstNode implements Cloneable, Serializable {
       public Token[]     tok;
       public AstNode[]   arg;
       public AstNode     right;
       public AstNode     left;
       public AstNode     up;

    Relationship of AST Nodes to Bali Productions

    AST classes are generated from Bali grammar specifications. Consider a Bali production that is a named rule:

            : TOKEN1 NonTerm2 TOKEN2 TOKEN3 NonTerm3 :: AClass

    wpe6.gif (1649 bytes)Bali generates a class called AClass that is instantiated whenever the pattern associated with AClass is recognized.  The AST that is produced is shown to the right: an instance of AClass roots an AST that has two children: an instance of NonTerm2 and an instance of NonTerm3 (both objects of which are siblings of each other).  The contents of the arg[] array and tok[] array is:

    arg[0] = // pointer to NonTerm2 instance
    arg[1] = // pointer to NonTerm3 instance
    tok[0] = // pointer to TOKEN1 instance
    tok[1] = // pointer to TOKEN2 instance
    tok[2] = // pointer to TOKEN3 instance

    How is an AST printed?

    How an AST is printed can be answered in two levels of detail. In the most detailed, we'll see the information that is stored within an AST node for printing. In the more typical interpretation, click here to see the standard idioms to use when programming in AHEAD.

    Print Information Stored in AST Nodes

    In order to print the AST rooted at the AClass node (or any node for that matter), we need to be able to print each subtree and each of the nodes tokens in the order prescribed by the AClass named rule.  This is accomplished by a boolean array order[] which specifies the interweaving of token and subtree printing.  In the AClass example above:

    boolean order[] = { true, false, true, true, false };

    That is, order[0] means print the first token (tok[0]), order[1] means print the first subtree (arg[0]), order[2] means print the second token (tok[1]), and so on.  The order array is generated by Bali and is otherwise unseen by users.

    Note also that tokens store both the text of the token plus the white space (comments, new lines, etc.) that preceded the token during a parse.  In this way, it is possible for AHEAD-produced precompilers to parse an input file and unparse it to produce an output file that is Unix-diff equivalent to the input file.  (Stated another way, saving the white space allows one to transform programs while retaining comments.  This is essential for tools that automatically modify programs which are subsequently hand-edited by programmers).

    Below are methods that are useful for tokens.  These methods are invoked by the pattern tok[i].method() -- meaning apply method() to token #i:

    void setTokenName(String x) set the token string to x
    String getTokenName() returns string of token #i
    int linenum() returns line number on which the token appeared
    void print() prints the token and whitespace
    void reduce2java(AstProperties p) reduces to Java (prints token)
    void reduce2ast(AstProperties p) generate code that will create this token

    Normalizing Trees

    In general, AHEAD allows no sharing of AST subtrees - i.e., multiple parent nodes point to the same child AST subtree.  Now, sharing subtrees is too easy to do in AHEAD metaprogramming:

    AST_Exp e = exp{ 4+5 }exp;
    AST_Stmt s = stm{ System.out.println( $exp(e), $exp(e)); }stm; //sharing!

    The above code is not correct if it is to be subsequently manipulated.  (If it is merely to be printed, then tree sharing is OK).  One fix to the above code would be to clone each AST before linking it in. 

    AST_Exp e1 = (AST_Exp) e.clone();
    AST_Stmt s = stm{ System.out.println( $exp(e), $exp(e1)); }stm;

    Another and simpler is to normalize the tree.   The normalize method walks a tree and locates shared subtrees; it clones each shared subtree so that the resulting AST has no tree sharing.

    AST_Exp e = exp{ 4+5 }exp;
    AST_Stmt s = stm{ System.out.println( $exp(e), $exp(e)); }stm; //sharing!
    s.normalize();       // tree sharing removed

    Detaching Trees

    Occasionally it is useful to detach a subtree from an AST.   (That is, all parent and sibling linkages are removed). A typical application is that the detached tree is to be serialized to disk for some later retrieval and manipulation.   The following idiom illustrates the usage of Detach.

    // let a be the node that is to be detached from an AST
    // and is to be written to file "myfile"

    Methods on AST Nodes

    The following are methods that can be performed on AST nodes.

    Method Semantics
    String toString() convert tree rooted at AST node to its string representation
    Object clone() clone tree rooted at AST node (Note: you must cast the result to the appropriate type before use)
    AstNode hasAncestor(String classname) progress up the AST looking for a parent node that is an instance of classname (or an instance of some derived class)
    void PrettyDump() dump the contents of an AST
    (useful for debugging)
    void dumpnode() useful for debugging small ASTs
    AstNode Replace(AstNode withnode) Replace(k) swaps "this" with node k. Replace returns k as a result.
    void print() unparses the contents of the AST rooted at "this" node
    void print(AstProperties props) unparses "this" tree and outputs the result to file specified in props
    reduce2java(AstProperties props) prints the Java code that is defined by the Jakarta AST
    that is rooted at "this" node
    boolean Equ( AstNode x ) return true if tree rooted by x is equal to "this"
    void Delete() delete the current node; node must be an element of some list structure
    void add( AstList tree ) adds tree after the current node. (Note: "this" must be an AstList).
    void add( AstList tree, String separator) just like add (above), but the separator is also specified. Useful for comma-separated lists.
    void AddAfter(AstList tree) add tree after "this" node; "this" node must
    be an element of some list structure
    void AddBefore(AstList tree) add tree before "this" node; "this" node must
    be an element of some list structure
    AstNode addComment(String comment) is method appends the comment given by the parameter
    to the first AstToken
    AstNode addComment(String comment, boolean replace) optionally add or replace comment in front of first token
    void appendComment(String x) add comment x to the specified tree after the first white space; tree must have a token in it
    void prependComment(String x) add comment x to the specified tree before the first white space; tree must have a token in it
    AstNode normalizeTree(  ) This method is used to repair ASTs that share subtrees which might have been created during sloppy metaprogramming.  The idea is to walk the tree looking for nodes that don't point to their parent. In such cases, the subtree is cloned and the result is that normalized trees don't have shared subtrees
    void Detach() Detach isolates a subtree from its left, up, and right node relations. Useful for disconnecting a tree from an existing tree
    void writeTree( String fileName ) serializes an AST to a file (filename.ser); make sure AST is detached before calling writeTree()

     AHEAD Home Page

    Copyright Software Systems Generator Research Group. All rights reserved.
    Revised: April 17, 2003.