CS 391L Machine Learning: Homework 4

CS 391L Machine Learning
Homework 4: Rule Induction

Due: Nov. 8, 2006

Current Weka Prism System

Weka contains a simple rule-learning system called Prism (weka.classifiers.rules.Prism). The system implements a top-down (general to specific) sequential-covering algorithm that employs a simple accuracy-based metric to pick an appropriate rule antecedent during rule construction. This algorithm is described in section 4.4 of the Witten and Frank textbook.

In this assignment, you should test rule learning on the soybean (full set), audiology, and splice (splice-new) data sets in /u/mooney/cs391L-code/weka/data/. Since Prism only supports nominal features and not continuous ones, it is unable to run on many of the other data sets. The slightly edited version of Prism in the class directory collects several summary statistics on the size of the rule base by adding AdditionalMeasureProducer's for the number of rules learned (measureNumRules), the total number of rule antecendents learned (measureNumAntecedents), and the average number of antecedents per rule (measureAveAntecedents).

Propositional Foil

The Foil heuristic for picking the best antecedent to add to a rule, as discussed in class, is somewhat more sophisticated than the simple heuristic in Prism. Make a simple propositional Foil-based learner called Pfoil. The easiest way to do this is to copy and edit the Prism code and simply change the heuristic it uses to pick the best rule-antecedent at each point. Computing the Foil heuristic may require storing additional information during rule construction.

Use the Experimenter GUI to run 10 fold cross-validation learning curves comparing Prism, Pfoil, and the more sophisticated rule learner weka.classifiers.rules.JRip on the soybean, audiology, and splice data. JRip is a version of W. Cohen's Ripper that is a Pfoil-based algorithm with a form of reduced-error pruning (REP) using a tuning set to prune learned rules to prevent over-fitting. If you give Ripper too little training data, it can cause and error "Not enough data for REP", so you cannot run it on a very small training set.

Remember to plot points more densely early in the learning curve where changes are more pronounced. Note: these experiments are fairly computationally intensive and may take a little while to run (particularly the splice data). In the Analyzer in the Experimenter you can analyze training time, train accuracy, testing time, number of rules, number of antecedents, and average number of antecedents per rule as well as test accuracy.

In your report, discuss the following issues:

How does the accuracy of Pfoil compare to Prism? Are differences statistically significant? Why?
How does the complexity of rule bases generated by Pfoil compare to that of Prism? Why?
How does the training time and testing time of Pfoil compare to Prism? Why?
How do both of these methods compare to the more sophisticated JRip rule learner, both in terms of accuracy, rule complexity, and run time? Why?
How does the number of training examples affect the relative performance of different methods? Why?
How does the relative performance of different methods vary across domains (i.e. datasets)? Why?

Submit your learning curves (in color) with your hard-copy report and include discussion of the issues above. Edit the gnuplot files to reposition the key table and shorten system names to make the graphs more presentable.