Put Up or Shut Up John R. Ellis Xerox Palo Alto Research Center ellis@parc.xerox.com September 9, 1993 It's time for researchers to put up or shut up. For over twenty years, we've been touting the supposed benefits of garbage collection: It slices, it dices, it even cuts garden hose! But garbage collection still isn't available to the working programmer. If we really believe that GC improves programmer productivity and product quality, academic and industrial researchers need to get it into the hands of the average programmer. If we can't get commercial programmers using collection in the next several years, we should acknowledge failure and stop most GC research_collection may be a useful prototyping tool, but it just isn't viable for products. Why should the funding agencies and corporations subsidize another twenty years of garbage-collection research? We could bicker endlessly over why garbage collection hasn't been accepted up till now: The languages have been inappropriate, we focused on hardware-based solutions, the algorithms are too inefficient. Or instead, we can focus on what we can do, starting now, to integrate GC into the real world. In the past seven months, Dave Detlefs (DEC SRC) and I talked to C++ compiler and tool implementors at Microsoft, Sun, DEC, Purify, and Xerox. It's clear that they view GC as still too speculative, too inefficient, too unportable, too complicated for the average programmer. (Borland wouldn't even return my calls.) The existing language vendors aren't going to add GC to widely used programming environments unless we researchers work directly with them and address their immediate concerns. We must show them that GC can be practical for commercial programming. Now we have an opportunity that won't last long. Rightly or wrongly, the commercial world is adopting "object-oriented programming" with a vengeance: Microsoft, Borland, Sun, Apple, Next and dozens of smaller vendors are pushing object-oriented paradigms in their products. In support of this trend, C++ is rapidly becoming the primary implementation language. This long-delayed embrace of abstraction, encapsulation, and reusable components is making storage management much harder_even "small" desktop applications consists of hundreds of thousands of lines of source and have dozens of man-years invested in them. It's still early days. There will be several more years of fumbling and false starts, and nothing is set in concrete (yet). Despite the best efforts of the ANSI C++ committee, C++ is still evolving. We're being handed a perfect opportunity to persuade working programmers that GC should be an essential element in their standard C++ and C toolbox. Let's not blow it. We don't need any more research in base technology_it's already adequate for many commercial uses. What we do need is a lot of effort invested in hard research-engineering issues of how to integrate GC into the current programming environments used by working programmers. The language vendors aren't going to do this work, and even if they wanted to, they shouldn't. They don't have the knowledge or the experience. It's the rightful responsibility of researchers to help the vendors make the hard design decisions. But doesn't the work of Boehm and Bartlett on partially conservative collectors already provide complete prototypes for vendors to emulate? No. They address only some of the issues, leaving many others unanswered. The issues fall into three major categories: the language interface, suitable algorithms, and code- generator safety. Dave Detlefs and I have written a paper addressing these in great detail1, so I'll only summarize the questions: Language interface. How do programmers access garbage collection from within C++ and C? The syntax and semantics of the language interface must be standardized soon. Otherwise, application programmers can't write portable programs, and a product manager would be a fool to let his programmers use a non- portable, vendor-specific extension without a very strong reason. Are all the objects in a program garbage collected? That sounds bad, since we probably can't integrate garbage collection seamlessly into existing libraries without source modification, and since commercial programmers will still want to optimize resource- critical parts of their programs. Allowing both collected and non-collected objects will let GC be introduced incrementally into large systems, allaying the fears of managers. How do programmers specify which objects should be garbage collected? With type declarations, C++ placement syntax, or standardized class declarations? The solution for C++ will necessarily be different than for C. Type declarations require language changes; placement syntax violates the principle of encapsulation by forcing clients to decide how to allocate objects; standardized class declarations don't allow non-class types to be garbage collected. Can non-collected objects reference collected objects and vice versa? Prohibiting such cross-heap references will make the incremental introduction of collection much harder. Since the prohibition will be hard to enforce by convention and could lead to nasty, subtle bugs, it would probably need to be enforced automatically within the type system, necessitating non- trivial language changes. On the other hand, cross-heap references puts bigger demands on collector implementations. Can pointers to collected objects be passed to libraries written in other languages (C, Ada, Fortran, ...)? If not, that makes the incremental introduction of GC into large systems much harder, since there are existing reusable components written in those languages that commercial programmers aren't about to give up just for the privilege of using GC. But passing collected pointers to code written in other languages requires all these same language and implementation issues to be addressed for those languages as well. Doing so formally, with proper language specifications and working through the standards process, is a practical impossibility, though perhaps not a practical necessity. Can programmers create pointers to the interiors of objects? That's a common, though perhaps not necessary idiom in C++; it's essential in C (for example, to implement by-reference parameter passing). Prohibiting interior pointers is almost certainly unacceptable to large numbers of programmers, regardless of their objective usefulness. On the other hand, handling arbitrary interior pointers imposes significant costs on collectors, and these costs haven't received much attention from researchers. (All current C++ implementations use interior pointers internally not only for evaluating addressing expressions into stack and register temporaries, but also for representing source pointers to multiply inherited objects; such pointers can be stored in the heap.) What are the safety rules that programmers must follow to ensure correct operation of the collector with their program? It's clear that at least some restrictions must be placed on the de facto definitions of C++ and C as used by most programmers, and probably even the "formal" ANSI standard specifications need some restrictions. For example, can pointers be cast to integers and back again? Can pointers be stored in unions? Do programmers have the option of asking the language implementation to automatically enforce the safety rules in selected sections of their code, as they do with the safe subsets of Cedar, Modula-2+, and Modula-3? When programmers violate the safety rules, garbage collectors can make a total hash of the heap, and the resulting mess can be much harder to debug than if they were using explicit storage management. It's clear that systems programmers occasionally need to use language features whose use safe use can't be automatically enforced, but a safe subset allows programmers to identify which portions of their program can be automatically checked and which can't. How do programmers specify finalization (object clean-up)? The natural mechanism in C++ is to use destructors, but that raises religious issues that leaders in the C++ language community seems unwilling or unable to discuss substantively. What are the semantics of finalization? In collected environments such as Lisp, Cedar, and Modula, clean-ups get applied asynchronously with respect to other threads of execution, but that's problematic for most C++ and C implementations that have only one thread of control. What guarantees of clean-up ordering are provided to the programmer? Are clean-ups guaranteed to be invoked at all? Should there be multiple heaps, each with a different semantic or performance policy? Should the definitions of some or all of these multiple heaps be standardized? Research experience with multiple heaps has been very limited to date. Trying to introduce not- well-understood concepts into an industrial design is dangerous. Just getting plain-old-vanilla GC into the hands of programmers might be a big enough task with sufficient pay-off over many years. Suitable collection algorithms. Besides the traditional performance problems of garbage collection (latency and space and time overhead), practical C++/C garbage collection introduces more headaches. Depending on the answers to the questions above, commercial viable collectors face additional requirements: Interior pointers addressing the middle of objects. Cross-heap pointers from collected to non- collected objects. Unions containing collected pointers (C++ and C unions are untagged, a requirement imposed by their de facto language definitions, if not their formal definitions). Multi-lingual compatibility, allowing collected pointers to be passed to code written in other languages. Minimal changes to existing compilers and runtime environments. Vendors have limited budgets and overwhelming skepticism, and the fewer changes required by GC, the better. These requirements rule out many types of algorithms that researchers have investigated in the past. Partially conservative algorithms (those that scan stacks and registers conservatively and most of the heap precisely) are the most promising, especially the algorithms of Boehm and Bartlett. Virtual-memory synchronization may be the only practical way of introducing generational and incremental collection without modifying all the different language compilers used to produce a program. These algorithms are certainly feasible on Unix, OS/2, and Windows NT, but it's not clear how hard it is to implement them on traditional platforms such as Windows 3.1 and Macintosh. In my visits with vendors, they were most worried about the performance of garbage collection, especially partially conservative algorithms. The only thing that will allay their fears is substantive measurements of real commercial programs using garbage collection, something that we as researchers don't yet have. We may be confident that our years of experience with academic environments such as Lisp, Cedar, and Modula will carry over to commercial C++ and C programs, but without real data, the vendors will continue to ignore garbage collection. Measurements of small academic programs don't cut it. (Small is anything less than 100,00 lines that runs for less than a few minutes.) Microsoft couldn't care less about a 25,000 line VLSI routing program. Getting commercial-scale benchmarks is very hard, not least because vendors treat their applications' source code as family jewels. When I was at DEC SRC, I initiated a project to measure the performance of some large DEC programs modified to use garbage collection, and I got Ben Zorn a research grant that would allow him to use those programs for his own university research. We need more such industry-research cooperation. I suspect the most challenging remaining algorithmic problems are not total execution time, but rather latency and space overhead. A 15% increase in execution time isn't an issue for many commercial applications, but long pauses or a factor-of-two increase in total memory are unacceptable. We don't have a good story to tell about memory overhead with large, long-running programs. Code-generator safety. Partially conservative algorithms such as the Boehm and Bartlett collectors are designed to work with existing C++ and C compilers. Unfortunately, most good compilers can generate code that will cause incorrect operation of garbage collections. With many compilers, this only happens with optimization turned on, and rarely even then. To be safe, researchers using these collectors routinely turn off optimization or accept the rare possibility of a garbage collection bug (in practice, their programs are likely to have many more non-GC-related bugs). But commercial compilers need a more robust solution. Implementing code-generator safety is straightforward and well understood, but retrofitting existing commercial optimizing compilers can be very difficult. Of all the changes Detlefs and I proposed in our talks to compiler vendors, code-generator safety was the only one that they thought would take any appreciable effort to implement. Vendors were also concerned about the runtime costs of code-generator safety, since the straightforward techniques sometimes (but infrequently) generate extra instructions and use an extra register. Our limited research experience to date with code-generator safety indicates that in practice the overhead is insignificant, but we have no substantive measurements, especially on Intel architectures in which register pressure is a severe problem. The issue needs more attention from researchers before vendors will be satisfied. Conclusion. Of all the ways researchers have studied for improving programmer productivity, getting garbage collection into the hands of working programmers could yield the biggest bang for a small buck. But we can't expect vendors to come up with acceptable language and implementation designs without the closer cooperation of researchers. It's unlikely that any vendor will do any of this design work at all. They won't want to invest their limited resources until researchers 1) gather more believable measurements of commercially realistic benchmarks, and 2) build consensus about the language design and implementation issues and construct prototypes that address the concerns of vendors and their customers. The problems I've raised here may not seem sexy in comparison with the traditional longer-term problems that many GC researchers like to focus on. But vendors aren't going to the solve the problems, and unless we do, garbage collection will remain an esoteric domain that's rightly perceived to be of little short- or long- term value to industry. Funding agencies and corporate management will feel justified in reducing or even eliminating funding. In the longer term, commercializing garbage collection will make GC research more interesting and more exciting. Researchers in other fields such as operating systems, file systems, and networking have large-scale commercial systems that they can measure, analyze, improve, and sometimes revolutionize. But most garbage-collection research is stale and boring, because we don't have that real-world connection. _______________________________ 1 John R. Ellis and David L. Detlefs. "Safe, Efficient Garbage Collection for C++". Systems Research Center Report 102, Digital Equipment Corporation, 1993. Also available by anonymous ftp from parcftp.xerox.com:/pub/ellis/gc.