In-lining in Scale

Inlining is the process of inserting the body of a routine in place of a call to the routine. Inlining is performed in order to improve execution time by:

avoiding the overhead of the routine call, and
constructing larger basic blocks.

The disadvantage of inlining can be compiled-code bloat and increased compile time.

Several languages make explicit the programmer's desire to have a routine inlined. An example is C++ with it's inline attribute.

Java does not make inlining explicit but, rather, makes it easy for the compiler to accomplish it. It does this through the byte-codes, the stack oriented operations, and its edit-compile-test philosophy.

Scale currently deals with languages (C and Fortran) that have no explicit provisions for inlining. However, inlining can still have some powerful advantages. Some hardware architectures demand very large basic blocks in order to be effective. A typical C program results in very small basic blocks. The availability of inlining may allow the programmer to use a more object-oriented programming style in C without suffering the ill-effects of the resulting small basic blocks.

Inlining Criteria

A criteria for picking routines as candidates for inlining is required. The criteria chosen may have an impact on when that inlining may be performed. If the criteria depends on metrics that are only available at certain points, then the inlining can not be performed until the metric is available. The decision on criteria also impacts the information on each routine that must be retained. For example, a certain criteria, such as don't inline if the inlined code is bigger than the calling code, would require that inlining be performed after the instructions for each routine have been generated. Or, at least until a good estimate had been obtained.

Consideration must be given as to whether or not routines that are candidates to be inlined can also have inlined routines. If the inlining were performed on-the-fly, the implementation resulting from allowing such recursion would be easier to accomplish using Java code rather than C code. If the inlining were performed but once, some ordering of the inlined routines would be required.

Inter- or Intra- inlining

The question naturally arises of whether Scale should inline code from other modules into the module being compiled. Scale was built with the idea of performing inter-module optimizations. It has a command-line switch (-M) that selects inter-module compilation. What this means is that after a module is compiled, information about that module is retained while the other modules are compiled. For inlining, the information required depends to a large extent on when inlining is performed.

Obviously, the disadvantage to inter-module inlining is the retention of this information and its effect on the memory usage of the Scale compiler. Scale is already a memory hog. It was developed as a research compiler and very little though or effort has gone into making it memory efficient. This memory in-efficiency is aggravated by the fact that Scale is written in a language that uses implicit memory management at run-time. Inter-module inlining makes the memory usage worse.

If only intra-module inlining is performed, many, if not most, of the opportunities to inline would be unavailable.

It is not a simple matter to decide where or when inlining should be performed. There are many possibilities. For Scale we decided to perform inlining of the CFG just after it has been formed but before any optimizations have been performed.

(Last changed: March 21, 2007.)