The bali2jak Tool

The bali2jak tool is a command-line application that generates a parse-tree layer from a Bali grammar file. This document describes how bali2jak is invoked and it briefly describes the classes generated in the parse-tree layer. The reader should be familiar with the Bali language documentation and the HowTo guide.

Invoking bali2jak

The bali2jak tool accepts a single Bali grammar file as input and it produces an entire directory of generated Jak source files. Each Jak source file contains a class definition as described in the next section. For now, we concentrate on the command invocation.

Suppose the Bali input grammar is in grammar.b and the generated files are to be placed into directory parse. Then, an appropriate command would be:

bali2jak grammar.b -directory parse

That's it! This produces a layer in directory parse, creating the directory if necessary. In this example, the layer is not given an explicit name. In general, though, the bali2jak command can accept an optional -layer option that specifies a layer name. For example, if the layer is supposed to be named syntax, that can be accomplished with the following command:

bali2jak grammar.b -directory parse -layer syntax

At this point, you've seen all there is to bali2jak invocation. The command has only three arguments: (1) The input file name, which is required; (2) The output directory name, which is optional and which defaults to the current directory; and (3) An optional layer name argument, which has no default.

Generated Class Files

This section should be read in combination with the older abstract syntax tree documentation (AST). The AST document is slightly outdated, but is essentially correct. Eventually, that document will be merged with this one but, for now, both should be read to get a full understanding of the generated classes.

When bali2jak generates classes from a Bali grammar file, it builds inheritance hierarchies in a class framework for the non-terminal symbols defined by Bali grammar rules. Each type of Bali production specifies slightly different parts of the framework and this section describes the mapping of the Bali productions to the class framework. We assume the reader is familiar with the four kinds of Bali grammar productions: (1) Named productions; (2) Sub-productions (also called unnamed productions or unnamed rules); (3) Simple lists; and (4) Complex lists.

Briefly, every Bali non-terminal symbol corresponds to a class definition in the generated code. The classes generated for rules with list productions will be concrete, but all other classes corresponding to non-terminal symbols will be abstract. The only other concrete classes to be generated will be those corresponding to named productions and to list elements. Generated classes will differ in their contents and in their inheritance hierarchy. Details are given in the paragraphs below where we assume that bali2jak was invoked as follows:

bali2jak grammar.b -directory parse -layer syntax

Named productions. Each named production corresponds to a concrete class in the class framework. For example, we have the following rule that contains one named production:

If : "if" Test ThenClause [ElseClause] :: IfNode ;

Then bali2jak will generate a concrete class named IfNode to correspond to this production. Further, since the rule If is a non-terminal that doesn't specify a list production, an abstract class named If will be generated for this rule. Finally, IfNode will be defined as a sub-class of If.

The IfNode class will be generated with only a default constructor, but several methods will be generated. Arguably, the most important method is an initializer named setParms which has four arguments, one corresponding to each primitive element in the named production. Method setParms allocates the arg array of non-terminals, the tok array of terminals, and the arguments are placed into the appropriate position in those arrays. The code generated for IfNode's setParms method is as follows:

final public static int ARG_LENGTH = 3 ;
final public static int TOK_LENGTH = 1 ;

public IfNode setParms
(AstToken tok0, Test arg0, ThenClause arg1, AstOptNode arg2) { 
    arg = new AstNode [ARG_LENGTH] ;
    tok = new AstTokenInterface [TOK_LENGTH] ;
        
    tok [0] = tok0 ;              /* "if" */
    arg [0] = arg0 ;              /* Test */
    arg [1] = arg1 ;              /* ThenClause */
    arg [2] = arg2 ;              /* [ElseClause] */
        
    InitChildren () ;
    return (IfNode) this ;
}

Every named production will have a setParms method that accepts the productions primitive elements as arguments and initializes the two arrays arg and tok. Similarly, every named production will contain a printorder method that specifies which arguments to setParms are terminals and which are non-terminals. The printorder method returns a boolean array that's the same length as setParm's argument list and the entries in the returned array are true exactly when the corresponding setParm argument is a terminal. Here's the printorder method for IfNode:

public boolean[] printorder () { 
    return new boolean[] {true, false, false, false} ;
}

Finally, each class generated for a named production contains an accessor method for each unique named primitive element. For example, the If production has three named primitive elements: Test, ThenClause and ElseClause. All of these names are unique within this production, so bali2jak will generate an accessor method for each one. The remaining primitive element is a quoted string that hasn't been assigned a name, so no accessor method will be generated for it. Here are IfNode's three accessors:

public ElseClause getElseClause () { 
    AstNode node = arg[2].arg [0] ;
    return (node != null) ? (ElseClause) node : null ;
}

public Test getTest () { 
    return (Test) arg [0] ;
}

public ThenClause getThenClause () { 
    return (ThenClause) arg [1] ;
}

Note that all accessors return typed nodes. Further, for optional elements such as ElseClause, the corresponding accessor method returns a null if the element wasn't present.

Sub-productions: Consider the following rule that has two sub-productions:

Statement : Assignment | Control ;

This example specifies that the two non-terminals Assignment and Control derive from non-terminal Statement. This relationship is reflected in the class framework by defining abstract classes Assignment and Control that extend class Statement. Here are the two generated classes:

abstract public class Assignment extends Statement { }

abstract public class Control extends Statement { }

Similar classes are generated for all sub-productions. Note that there are no methods generated within these classes.

Simple lists: A simple list is a rule with exactly one production as shown below:

Simple : (Symbol)+ ;

In this case, two classes are generated for rule Simple: A class Simple that extends kernel class AstList, and a class SimpleElem that extends AstListNode. The Simple class represents the list itself and it inherits standard list-traversal methods although it adds no new methods. On the other hand, the SimpleElem class represents the elements within the list. It contains two generated methods: An initializer method setParms that initializes the element with a Symbol primitive; and an accessor method that returns the Symbol primitive from the list element. Here's the generated code for Simple and SimpleElem:

public class Simple extends AstList { }

public class SimpleElem extends AstListNode {

    public Symbol getSymbol () { 
        return (Symbol) arg [0] ;
    }

    public SimpleElem setParms (Symbol arg0) { 
        Super(AstNode).setParms (arg0) ; /* Symbol */
        return (SimpleElem) this ;
    }

}

Note that the methods generated for SimpleElem are similar to methods generated for named productions.

Complex lists: A complex list is similar to a simple list in some key respects. First, a rule defining a complex list can have exactly one complex list production as shown in the following example:

Program : Statement (";" Statement)* ;

The general form of a complex list production is a leading primitive followed by zero or more trailing primitives, all separated by some a separator primitive. In the example above, the leading primitive is of the same type as the trailing primitives, though this is not required. As with a simple list, bali2jak generates two classes: One representing the list itself and another representing the elements of the list. For our example, the Program class represents the list and class ProgramElem represents a list element. Here are the two classes generated for our example:

public class Program extends AstList { }

public class ProgramElem extends AstListNode {

    public Statement getStatement () { 
        return (Statement) arg [0] ;
    }

    public ProgramElem setParms (AstToken tok0, Statement arg0) { 
        tok = new AstToken [1] ;
        tok [0] = tok0 ;              /* ";" */
        return setParms (arg0) ;      /* Statement */
    }

    public ProgramElem setParms (Statement arg0) { 
        Super(AstNode).setParms (arg0) ; /* Statement */
        return (ProgramElem) this ;
    }

}

Again, the Program is an empty class and the ProgramElem class contains an accessor method to return the list element. However, ProgramElem contains two initializer methods with different signatures. The setParms method with a single argument is used to initialize the leading element of the list while the two-argument version initializes a trailing element with its preceding separator element.

Example (with Generated Code!)

This section contains a fully worked-out example. First, we show a complete listing of the example grammar, some of which was used in the previous sections. We make no representation that this is a useful grammar!

grammar.b
TOKEN : {
    <COMPARISON : "==" | "!=">
    | <IDENTIFIER : <LETTER> (<LETTER> | <DIGIT>)*>
    | <INTEGER : <DIGIT> <DIGIT>*>
    | <#LETTER: ["a"-"z", "A"-"Z"]>
    | <#DIGIT: ["0"-"9"]>
}

Program : Statement (";" Statement)* ;

Statement : Assignment | Control ;

Assignment : IDENTIFIER "=" Expression :: AssignmentNode ;

Expression : Operand [Addition] :: ExpressionNode ;

Operand : IDENTIFIER :: IdentifierOperand
	| INTEGER :: IntegerOperand ;

Addition : "+" Expression :: AdditionNode ;

Control : If | While ;

If : "if" Test ThenClause [ElseClause] :: IfNode ;

Test : "(" Expression COMPARISON Expression ")" :: TestNode ;

ThenClause : "then" Assignment :: ThenNode ;

ElseClause : "else" Assignment :: ElseNode ;

While : "while" Test "do" Assignment :: WhileNode ;

The files generated from this grammar by bali2jak are shown below. Please follow a link to see the generated code for each class:

Generated with command
bali2jak grammar.b -directory parse -layer syntax
Addition ElseClause If ProgramElem ThenClause
AdditionNode ElseNode IfNode Statement ThenNode
Assignment Expression IntegerOperand Test While
AssignmentNode ExpressionNode Operand TestNode WhileNode
Control IdentifierOperand Program  

ATS Home Page

Copyright © Software Systems Generator Research Group. All rights reserved.
Last modified: Thu May 1 13:25:08 CDT 2003

Jacob Sarvela