Peter Clark - Software
Software which co-workers and I have developed. To be used
solely at your own risk! We'd appreciate an acknowledgement if you
use any of these packages in your research. We'd also be interested in
hearing any comments or results you have from using this code.
Similarly, if you have any questions or problems with the s/w, please email me
(peter.e.clark@boeing.com). Have fun!
Contents:
- KM: The Knowledge Machine
- CN2 - Rule induction from examples
- Guiding Inductive Learning with a Qualitative Model
- LPE - Lazy Partial Evaluation
KM is one of the knowledge representation languages used by the University
of Texas at Austin. It is an efficient, expressive, frame-based AI
language with well-defined semantics, and includes facilities for
classification, situations, and sophisticated handling of multiple inheritance.
It is a much-evolved derivative of the Theo system.
Designed by Bruce Porter (porter@cs.utexas.edu) and myself, with
this implementation mainly by me.
Software
The algorithm is implemented in Lisp. Manuals, implementation, and
some worked examples are available from the UT Knowledge-Base Group page,
at http://www.cs.utexas.edu/users/mfkb/km.html.
Overview
This algorithm inductively learns a set of propositional if...then... rules
from a set of training examples. To do this, it
performs a general-to-specific beam search through
rule-space for the "best" rule, removes
training examples covered by that rule, then repeats until no more "good"
rules can be found. The original algorithm (Machine Learning
Journal paper below) defined "best" using a combination of entropy and a
significance test. The algorithm was later improved to replace this evaluation
function with the Laplace estimate (EWSL-91 paper, below), and also to
induce unordered rule sets as well as ordered rule lists ("decision lists").
The software implements the latest version (ie. using the Laplace
heuristic), but has flags which can be set to return it to the original
version. The algorithm was designed by Tim Niblett
(tim.niblett@turing.gla.ac.uk) and myself.
Software
The most recent version (below) of CN2 was implemented in C in 1990 by Robin
Boswell (rab@scms.rgu.ac.uk). It includes the source code, a Sparc
executable, a collection of databases in CN2 format, and documentation.
Rick Kufrin (rkufrin@ncsa.uiuc.edu) has
kindly provided minor changes (included in the release below) which allow
CN2 to run under several additional platforms/OS, including:
SunOS 4.1.3, Solaris 5.4, IRIX 5.3, HP-UX A.09.05, and Linux 1.2.8.
Johannes Fuernkranz has also helped getting CN2 to run under Windows XP.
Francisco Reinaldo (Univ Porto, Portugal) and Marcus Siqueira (UnilesteMG, Minas Gerais, Brazil) created the Executable for Windows XP below. Many thanks to all for their contributions.
- Source Code, User Manual, Technical Papers, Examples - Source code, including documentation and examples. Choose your preferred format:
- Executable for Windows XP:
Instructions:
Download both files to the same folder and click on cn.exe. Many thanks to
Francisco Reinaldo (Univ Porto, Portugal) and Marcus Siqueira (UnilesteMG, Minas Gerais, Brazil) for making this port!
For User Manual and example files, download the software bundle
listed just prior to this executable.
References
-
P. Clark and R. Boswell.
Rule induction with CN2: Some recent improvements.
In Y. Kodratoff, editor, Machine Learning - EWSL-91, pages
151-163, Berlin, 1991. Springer-Verlag.
(Abstract
and
compressed postscript).
- P. Clark and T. Niblett.
The CN2 Induction Algorithm.
Machine Learning, 3(4):261-283, 1989.
(Abstract
and
compressed postscript).
Overview
Input: a set of training examples and a qualitative model.
Output: a
set of propositional if...then... classification rules which are
also "explainable" by the qualitative model.
This package allows a qualitative model to bias induction of
propositional if...then... rules (using CN2), so that only rules which
are also "explainable" by the qualitative model (approximately:
having a corresponding path in the influence graph)
are found. This is important
for practical application of ML, where we wish to use domain knowledge
as well as training data to guide rule learning.
Learning occurs in two phases: First, a specialisation lattice containing
only (and all) rules "explainable" by the QM is explicitly enumerated.
Second, the CN2 induction algorithm is used to learn rules from training
data, but CN2's specialisation operator restricted to work on the
QM-generated specialisation lattice. (NB: other implementations of this
method, eg. which don't explicitly enumerate the lattice a priori,
would be equally valid).
The authors are Stan Matwin (stan@csi.uottawa.ca) and myself.
Software
The algorithm is implemented in Quintus Prolog. It was made available on
WWW in Jan 1995 so has not been extensively tested outside our lab yet.
Contact us if you have questions. The software contains source code,
the domain models and data sets used in the ML93 paper (below), and
documentation. Knowledge of Prolog isn't needed to use the software.
The software is public domain and freely available.
For those without Quintus Prolog -- we also provide Sun Sparc executables of
this software (ie. compiled Prolog, without the Prolog development
environment). These do not require a Quintus Prolog licence (nor even
any knowledge of Prolog) to run, but of course require a Sparc machine.
A licence may be needed to use these executables for commercial
use; contact me for info.
To download, click below. The software is tar'ed - to unbundle it,
do "tar xf <file>" where <file> is the file
that you stored the downloaded code in.
References
- P. Clark and S. Matwin.
Using qualitative models to guide inductive learning.
In P. Utgoff, editor, Proc. Tenth Int. Machine Learning
Conference (ML-93), pages 49-56, CA, 1993. Kaufmann.
(Abstract
and
compressed postscript).
Overview
Lazy partial evaluation is a form of speed-up learning, when reasoning
with a domain theory. It is a hybrid between:
- partial evaluation (PE),
where a procedure is "unwound" in all possible
ways and the results cached and indexed.
- explanation-based learning (EBL),
where just execution paths through
the procedure which prove specific theorems are identified and cached.
LPE does "partial evaluation on demand". It can be advantageous over PE
as it avoids redundant expansion of a procedure (hence saving memory and
CPU time). It can be advantageous over EBL as it avoids proving theorems
from scratch with the (slow) original domain theory (when no cached
solution applies), and avoids the "masking effect" where suboptimal, cached
solutions are chosen in preference to better solutions implicit in the
domain theory (when a cached solution applies).
It is described in detail in the paper below. The authors are Rob Holte
(holte@csi.uottawa.ca) and myself.
Implementation
LPE is implemented in Quintus Prolog. It comes with documentation, demos,
and the domain theories used in that paper.
The software is public domain and freely available.
To download, click below. The software is tar'ed - to unbundle it,
do "tar xf <file>" where <file> is the file
that you stored the downloaded code in.
References
- P. Clark and R. Holte.
Lazy partial evaluation: An integration of explanation-based
generalisation and partial evaluation.
In D. Sleeman and P. Edwards, editors, Proc. Ninth Int. Machine
Learning Conference (ML-92), pages 82-91, CA, 1992. Kaufmann.
(Abstract
and
compressed postscript).
peter.e.clark@boeing.com