Littlewood, B., Popov, P. and Strigini, L.. Design Diversity: an Update from Research on Reliability Modelling, , 2001. pages .
online:
homepdf abstract:
Diversity between redundant subsystems is, in various forms, a common design approach for improving system dependability. Its value in the case of software-based systems is still controversial. This paper gives an overview of reliability modelling work we carried out in recent projects on design diversity, presented in the context of previous knowledge and practice. These results provide additional insight for decisions in applying diversity and in assessing diverse-redundant systems. A general observation is that, just as diversity is a very general design approach, the models of diversity can help conceptual understanding of a range of different situations. We summarise results in the general modelling of common-mode failure, in inference from observed failure data, and in decision-making for diversity in development.
Bev Littlewood and Peter Popov and Lorenzo Strigini. Modelling software design diversity: a review, ACM Computing Surveys 33(2):177-208, June 2001. pages 177-208.
online:
citeseeracmps.gz abstract:
Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. While there is clear evidence that the approach can be expected to deliver some increase in reliability compared to a single version, there is no agreement about the extent of this. More importantly, it remains difficult to evaluate exactly how reliable a particular diverse fault-tolerant system is. This difficulty arises because assumptions of independence of failures between different versions have been shown to be untenable: assessment of the actual level of dependence present is therefore needed, and this is difficult. In this tutorial, we survey the modeling issues here, with an emphasis upon the impact these have upon the problem of assessing the reliability of fault-tolerant systems. The intended audience is one of designers, assessors, and project managers with only a basic knowledge of probabilities, as well as reliability experts without detailed knowledge of software, who seek an introduction to the probabilistic issues in decisions about design diversity.
Derek Partridge and Wojtek Krzanowski. Distinct Failure Diversity in Multiversion Software, , 1997. pages .
online:
citeseerps.gz abstract:
In earlier studies of multiversion programming, both empirical and
analytical, emphasis switched from notions of independence to one of
minimization of coincident failure. We show that neither independence
of failure, nor lack of coincident failure are the single important
properties.
Indeed, an N-version system may deliver an optimal performance
(under some voting strategy) even when the incidence of coincident
failure is arbitrarily high. The key notion that this study
contributes
is one of distinct different failure, and hence distinct-failure
diversity.
The important property is not whether versions fail on the same input
so much as whether they fail in the same way. If the failures of an
N-version system (on some input) are dispersed over a set of distinct
alternative outcomes, then this (hitherto unacknowledged) aspect of
diversity may be exploited to substantially enhance system
reliability.
(...)
John C. Knight and Nancy G. Leveson. An Experimental Evaluation Of The Assumption Of Independence In Multi-Version Programming, , 1986. pages .
online:
citeseerps.gz abstract:
N-version programming has been proposed as a method of incorporating
fault tolerance into software.
Multiple versions of a program (i.e. "N") are prepared and
executed in parallel. Their outputs are collected
and examined by a voter,and, if theyare not identical, it is assumed
that the majority is correct. This
method depends for its reliability improvement on the assumption that
programs that have been developed
independently will fail independently.Inthis paper an experiment is
described in which the fundamental
axiom is tested. Atotal of twenty sevenversions of a program were
prepared independently from the same
specification at twouniversities and then subjected to one million
tests. The results of the tests revealed
that the programs were individually extremely reliable but that the
number of tests in which more than one
program failed was substantially more than expected. The results of
these tests are presented along with an
analysis of some of the faults that were found in the programs.
Background information on the
programmers used is also summarized. The conclusion from this
experiment is that N-version
programming must be used with care and that analysis of its
reliability must include the effect of dependent
errors.
Mladen A. Vouk and Alper K. Caglayan and David E. Eckhardt and David F. McAllister and James L. Walker, Jr. and John J.P. Kelly and John Knight. Analysis of Faults Detected in a Large-Scale Multi-Version Software Development Experiment, , 1990 . pages .
online:
citeseerps.gzpdf John C. Knight abstract:
Twenty programs were built to the same specification of an inertial navigation problem. The programs were then subjected to a three phase testing and debugging process: an acceptance test, a certification test, and an operational test. Less than 20% of the faults discovered during the certification and operational testing were non-unique, i.e. the same or very similar faults would be found in more than one program. However, some of these "common" faults spanned as many as half of the versions (...)