Source code:
Will be available August 5, 2008, on the
Jikes RVM Research Archive
Abstract:
Type safety and garbage collection in managed languages eliminate memory errors such as dangling pointers, double frees, and leaks of unreachable objects. Unfortunately, a program still leaks memory if it maintains references to objects it will never use again. Leaked objects decrease program locality and increase garbage collection frequency and workload, and a growing leak will eventually exhaust memory and crash the program.
This paper introduces leak tolerance, which safely eliminates performance degradations and crashes due to leaks of stale objects in managed languages given sufficient disk space. Leak tolerance (1) identifies stale objects that the program is not accessing; (2) segregates in-use and stale objects, and stores stale objects to disk; and (3) activates stale objects if the program subsequently accesses them. Activation makes leak tolerance completely safe. We design and implement a prototype leak tolerance tool called Melt in a Java VM and show it adds overhead low enough for production systems. Our results show that existing VMs grind to a halt and then crash on programs with leaks, whereas Melt keeps many of these programs running much longer without significantly degrading performance. Leak tolerance provides users the illusion of no bug and developers more time to fix leaky programs.
Source code: Available for download from the Jikes RVM Research Archive
Jones and Ryder used our implementation for context-sensitive allocation sites in their ISMM 2008
paper.
Abstract:
Calling context enhances program understanding and dynamic analyses by providing a rich representation of program location. Compared to imperative programs, object-oriented programs use more interprocedural and less intraprocedural control flow, increasing the importance of context sensitivity for analysis. However, prior online methods for computing calling context, such as stack-walking or maintaining the current location in a calling context tree, are expensive in time and space. This paper introduces a new online approach called probabilistic calling context (PCC) that continuously maintains a probabilistically unique value representing the current calling context. For millions of unique contexts, a 32-bit PCC value has few conflicts. Computing the PCC value adds 3% average overhead to a Java virtual machine. PCC is well-suited to clients that detect new or anomalous behavior since PCC values from training and production runs can be compared easily to detect new context-sensitive behavior; clients that query the PCC value at every system call, Java utility call, and Java API call add 0-9% overhead on average. PCC adds space overhead proportional to the distinct contexts stored by the client (one word per context). Our results indicate PCC is efficient and accurate enough to use in deployed software for residual testing, bug detection, and intrusion detection.
Source code: Both implementations are publicly available:
svn://svn.valgrind.org/valgrind/branches/ORIGIN_TRACKING
Programs sometimes crash due to unusable values, for example, when Java and C# programs dereference null pointers and when C and C++ programs use undefined values to affect program behavior. A stack trace produced on such a crash identifies the effect of the unusable value, not its cause, and is often not much help to the programmer.
This paper presents efficient origin tracking of unusable values; it shows how to record where these values come into existence, correctly propagate them, and report them if they cause an error. The key idea is value piggybacking: when the original program stores an unusable value, value piggybacking instead stores origin information in the spare bits of the unusable value. Modest compiler support alters the program to propagate these modified values through operations such as assignments and comparisons. We evaluate two implementations: the first tracks null pointer origins in a JVM, and the second tracks undefined value origins in a memory-checking tool built with Valgrind. These implementations show that origin tracking via value piggybacking is fast and often useful, and in the Java case, has low enough overhead for use in a production environment.
Source code:
Available for download from the
Jikes RVM Research Archive
Abstract:
To reason about programs, dynamic optimizers and analysis tools use sampling to collect a dynamic call graph (DCG). However, sampling has not achieved high accuracy with low runtime overhead. As object-oriented programmers compose increasingly complex programs, inaccurate call graphs will inhibit analysis and optimizations. This paper demonstrates how to use static and dynamic control flow graph (CFG) constraints to improve the accuracy of the DCG. We introduce the frequency dominator (FDOM), a novel CFG relation that extends the dominator relation to expose static relative execution frequencies of basic blocks. We combine conservation of flow and dynamic CFG basic block profiles to further improve the accuracy of the DCG. Together these approaches add minimal overhead (1%) and achieve 85% accuracy compared to a perfect call graph for SPEC JVM98 and DaCapo benchmarks. Compared to sampling alone, accuracy improves by 12 to 36%. These results demonstrate that static and dynamic control-flow information offer accurate information for efficiently improving the DCG.
Source code: Available for download from the Jikes RVM Research Archive
Tang, Gao, and Qin modified our implementation for their USENIX 2008
paper.
Abstract:
Memory leaks compromise availability and security by crippling performance and crashing programs. Leaks are difficult to diagnose because they have no immediate symptoms. Online leak detection tools benefit from storing and reporting per-object sites (e.g., allocation sites) for potentially leaking objects. In programs with many small objects, per-object sites add high space overhead, limiting their use in production environments.
This paper introduces Bit-Encoding Leak Location (Bell), a statistical approach that encodes per-object sites to a single bit per object. A bit loses information about a site, but given sufficient objects that use the site and a known, finite set of possible sites, Bell uses brute-force decoding to recover the site with high accuracy.
We use this approach to encode object allocation and last-use sites in Sleigh, a new leak detection tool. Sleigh detects stale objects (objects unused for a long time) and uses Bell decoding to report their allocation and last-use sites. Our implementation steals four unused bits in the object header and thus incurs no per-object space overhead. Sleigh's instrumentation adds 29% execution time overhead, which adaptive profiling reduces to 11%. Sleigh's output is directly useful for finding and fixing leaks in SPEC JBB2000 and Eclipse, although sufficiently many objects must leak before Bell decoding can report sites with confidence. Bell is suitable for other leak detection approaches that store per-object sites, and for other problems amenable to statistical per-object metadata.
Source code:
Available for download from the
Jikes RVM Research Archive
Abstract:
Microarchitectures increasingly rely on dynamic optimization to improve performance in ways that are difficult or impossible for ahead-of-time compilers. Dynamic optimizers in turn require continuous, portable, low cost, and accurate control-flow profiles to inform their decisions, but prior approaches have struggled to meet these goals simultaneously.
This paper presents PEP, a hybrid instrumentation and sampling approach for continuous path and edge profiling that is efficient, accurate, and portable. PEP uses a subset of Ball-Larus path profiling to identify paths with low overhead, and uses sampling to mitigate the expense of storing paths. PEP further reduces overhead by using profiling to guide instrumentation placement. PEP improves profile accuracy with a modified version of Arnold-Grove sampling. The resulting system has 1.2% average and 4.3% maximum overhead, 94% path profile accuracy, and 96% edge profile accuracy on a set of Java benchmarks.
Source code: Available as part of the Scale compiler. Please contact me if you're interested in using the same version we used in the paper.
Vaswani, Nori, and Chilimbi modified our path profiling implementation for their POPL 2007 and FSE 2007
papers.
Abstract:
Modern processors are hungry for instructions. To satisfy them, compilers need to find and optimize execution paths across multiple basic blocks. Path profiles provide this context, but their high overhead has so far limited their use by dynamic compilers. We present new techniques for low overhead online practical path profiling (PPP). Following targeted path profiling (TPP), PPP uses an edge profile to simplify path profile instrumentation (profile-guided profiling). PPP improves over prior work by (1) reducing the amount of profiling instrumentation on cold paths and paths that the edge profile predicts well and (2) reducing the cost of the remaining instrumentation.
Experiments in an ahead-of-time compiler perform edge profile-guided inlining and unrolling prior to path profiling instrumentation. These transformations are faithful to staged optimization, and create longer, harder to predict paths. We introduce the branch-flow metric to measure path flow as a function of branch decisions, rather than weighting all paths equally as in prior work. On SPEC2000, PPP maintains high accuracy and coverage, but has only 5% overhead on average (ranging from -3% to 13%), making it appealing for use by dynamic compilers.
Source code:
Subsumed by the Practical Path Profiling source code.
Please contact me if you're interested in the original Targeted Path Profiling implementation.
Abstract:
In this paper, we present a technique for reducing the overhead of collecting path profiles in the context of a dynamic optimizer. The key idea to our approach, called Targeted Path Profiling (TPP), is to use an edge profile to simplify the collection of a path profile. This notion of profile-guided profiling is a natural fit for dynamic optimizers, which typically optimize the code in a series of stages.
TPP is an extension to the Ball-Larus Efficient Path Profiling algorithm. Its increased efficiency comes from two sources: (i) reducing the number of potential paths by not enumerating paths with cold edges, allowing array accesses to be substituted for more expensive hash table lookups, and (ii) not instrumenting regions where paths can be unambiguously derived from an edge profile. Our results suggest that on average the overhead of profile collection can be reduced by half (SPEC95) to almost two-thirds (SPEC2000) relative to the Ball-Larus algorithm with minimal impact on the information collected.