This revision represents a substantial change from the previous version, in which we have incorporated most of the suggestions of the reviewers. Below, we give a list of main changes. In addition, we have added and/or corrected references and fixed the minor comments raised by the referees. We have made several structural changes in this version, including the addition of an "Overall Orgnanization" section, a merging of the sections on enumerative model checking and execution exploration, and we have expanded several sections (including the conclusion section which also has a list of open directions). > 1. I find the coverage of topics rather nonuniform. This is > unavoidable in a survey such as this one, and I suppose that authors have editorial discretion in such circumstances. However, some sections of > the paper (eg, page 5, 32, 33) are too brief to be appreciated as anything more than an annotated bibliography, whereas certain > others (eg. pages 18-23) are covered in quite a bit of detail. You could consider just dropping certain topics in places where they degenerate into > a list of references and use the space to cover other topics in more detail. We have condensed some parts (e.g., discussion on external memory model checking) which were pointers to papers, or added more information (e.g., for compositional model checking). > 2. There is very little discussion about limitations of each technique and tool. This could be useful to graduate students looking for new research topics. The Conclusion section lists limitations of current tools. > 1. A definition of the terms "sound" and "complete" as they are used > in this paper would be helpful towards the beginning. There are many > uses of these terms, as well as related ones - such as "proving > correctness" versus "finding bugs", or "over-" versus > "underapproximation" - and their precise meanings and relationships > are not always clear. It should be stated for each method into which > category it falls. In particular the sources of unsoundness and > incompleteness that are related specifically to software deserve more > attention - issues such as arbitrary precision versus bitlevel > representations of integers. Tools differ considerably on how they > approach such issues and would consequently give different results on > the very same input program. In the same vain it would be helpful to > state clearly the possible reasons for failure of each method - not > only timeout or spaceout but possible reasons for nontermination, > failure to make progress (in finding an invariant or refining an > abstraction) etc. Main problems that are specific to software are > unbounded control (loops), unbounded data (heap), and unbounded > function calls (stack) and again each tool deals with them > differently. We have tried to add these definitions, and we have tried to include discussions on this topic. > 2. The authors are right in saying (in the introduction) that > "software model checking" is a misnomer, yet their survey is > remarkably model checking centric. While almost every line of > research in model checking is mentioned, whether useful or not for > software (compositional methods have hardly proved useful to date in > software), there is comparatively little detail on decision procedures > (which is perhaps the most essential technology to make model checking > work for software) or on dataflow analysis (which has an equally long > and influential history). We have a model-checking bias, as the survey was written from a model checking perspective. We omitted decision procedures as that is the topic of a different and complementary survey. > Especially the distinction between > path/flow-sensitive and -insensitive analyses is fundamental and > should be explained (together with corresponding algorithmic issues > such as joins of abstract elements). We have added some of this discussion. > 4. Some important software issues have received very little attention > in model checking, such as dynamically created processes, object > oriented structures, calls to library functions, incremental > verification etc. It would be nice to at least point this out and > list a few challenges. We added this discussion in the Conclusion section. > 5. It is not clear how model checking by systematic execution (Section > 3) differs from concrete enumerative model checking (Section 2). If > your point is that the methods in Section 2 abstract the full system > state, then they are not "concrete". Isn't stateless search just an > unsound optimization much like bitstate hashing and others? It seems > also bounded model checking would fall into this category, and the > issue wether symbolic representations or SAT formulations are used > seems orthogonal. Point well taken! We have tried to unify the presentation (and we have merged enumerative model checking and systematic execution. We kept bounded model checking where it was, since we distinguish "enumerative" (techniques that look at individual states) and "symbolic" (techniques that consider sets of states together). > An alternative classification would be according to > the three orthogonal dimensions of programming constructs (functions, > arrays, pointers etc.), abstraction level (execution of real program, > or idealized concrete program, or various abstract programs), and > reasoning engine (SAT/BDDs, various unquantified nonboolean theories, > universal quantification). This would result in a more daring and > unconventional, but ultimately more interesting presentation. This is an interesting suggestion, but we could not implement it as it would require a complete rewrite. > 1. Section 4.4, Page 12, Bounded Model Checking: I think it is not > correct to say that BMC is also called symbolic execution. They are > similar. However, traditionally, symbolic execution has been used > along a single or a subset of control paths. In contrast, > traditional BMC is symbolic even on control paths, i.e. handles > them all in a single constraint problem. Indeed, both CBMC and > F-Soft tools handle all control paths up to a certain depth (of > unrolling) in one single problem. This is made possible due to the > conflict-driven learning and sharing implemented in modern SAT > solvers. This sharing is either missed by symbolic execution, or > has to be explicitly arranged (by using incremental solvers, for > example) when solving multiple paths in an enumerative > fashion. Furthermore, BMC can check liveness properties, but > symbolic execution usually cannot. This is a good point. We have added some discussion on SymExp vs BMC, but we present both in the same part of the survey as to us, they are manifestations of the same underlying idea based on constraint solving. > 2. Section 4.4, Page 13: I am not sure that CBMC and Saturn use SMT > solvers at all. As far as I know, CBMC [the 2003 reference] uses only > SAT solvers, and Saturn uses CBMC (with the SAT solver). References > for SMT-based BMC are: - L. de Moura, H. Rue{\ss}, and M. Sorea, "Lazy > Theorem Proving for Bounded Model Checking over Infinite Domains," in > Proceedings of CADE, 2002. - F-Soft: M. K. Ganai and > A. Gupta. Accelerating High-level Bounded Model Checking. In > Proceedings of the IEEE International Conference on Computer-Aided > Design (ICCAD), November 2006. - A. Armando, J. Mantovani, > L. Platania. Bounded Model Checking of Software using SMT Solvers > instead of SAT Solvers. In the proceedings of the 13th SPIN workshop, > LNCS 3925, Springer Verlag. Fixed. > 3. Section 4.5, Page 13: "One way of ensuring termination is through > invariants". This begs the question: What are the other ways?! > Actually, it would be nice to provide a high-level perspective on the > crucial problems associated with software model checking, > e.g. ensuring termination, controlling size of representations, > etc. Although this already drives your narrative, perhaps you could > make it more explicit and draw connections to the different sections > in your organization. Fixed. > 4. Page 21 and 22: You mention the first advantage of interpolation is > that it can discover local predicates. I agree locality of > predicates is a good feature, and definitely provides a performance > benefit over global predicates. However, other techniques, such as > program invariants derived by abstract interpretation also provide > locality wrt control locations. For example, F-Soft uses locality > on top of weakest pre-condition based analysis and static > invariants to discover local atomic predicates. It also uses the > SAT solver's proof of unsatisfiability to pick up only the relevant > predicates among those suggested by weakest > pre-conditions. Similarly, the second advantage of interpolation > that you mention is restricting the language of predicates. Again, > this advantage is shared by other techniques, such as abstract > domain analysis. > > I understand that you are trying to contrast interpolation with purely > "syntactic" approaches for refinement, but the state of the art is to > combine semantic analysis (interpolants, program invariants, proofs of > unsatisfiability, etc) with syntactic approaches. It is clearly > sub-par to use only the latter, and perhaps this should be clarified. We added this discussion in the section on counterexample refinement. > -- An introduction laying out the structure of the rest of the paper > seems to be missing. This should definitely be added. Fixed > -- The authors seem to skirt the issue of soundness w.r.t. the > semantics of real programming languages. In practice, this is a very > big issue, and needs elaboration. In particular, it would be nice to > see how the techniques described in the paper deal with pointers, > low-level bit-wise semantics, and heaps, if at all. Added some comments.