ir.eval
Class Experiment

java.lang.Object
  extended by ir.eval.Experiment

public class Experiment
extends java.lang.Object

Contains methods for running evaluation experiments for information retrieval, specifically the generation of recall-precision curves for a given test corpus of query/relevant-documents pairs.


Field Summary
 java.io.File corpusDir
          The directory from which the indexed documents come.
 java.io.File outFile
          The output file where final recall/precision result data is printed.
 java.io.File queryFile
          The file with the list of queries and results to be tested.
static double[] RECALL_LEVELS
          The standard recall levels for which we want to plot precision values
 
Constructor Summary
Experiment(java.io.File corpusDir, java.io.File queryFile, java.io.File outFile, short docType, boolean stem)
          Create an Experiment object for generating Recall/Precision curves
Experiment(InvertedIndex index, java.io.File queryFile, java.io.File outFile)
          Create an Experiment object for generating Recall/Precision curves using a provided InvertedIndex
 
Method Summary
static void main(java.lang.String[] args)
          Evaluate retrieval performance on a given query test corpus and generate a recall/precision graph.
 void makeRpCurve()
          Process and evaluate all queries and generate recall-precision curve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

RECALL_LEVELS

public static final double[] RECALL_LEVELS
The standard recall levels for which we want to plot precision values


corpusDir

public java.io.File corpusDir
The directory from which the indexed documents come.


queryFile

public java.io.File queryFile
The file with the list of queries and results to be tested. Assumes this file consists of 3 lines for each query: 1) A line of text for the query. 2) A line of filenames from corpusDir that are relevant to this query, filenames must be separated by a space. 3) A blank line as a separator from the next query.


outFile

public java.io.File outFile
The output file where final recall/precision result data is printed.

Constructor Detail

Experiment

public Experiment(java.io.File corpusDir,
                  java.io.File queryFile,
                  java.io.File outFile,
                  short docType,
                  boolean stem)
           throws java.io.IOException
Create an Experiment object for generating Recall/Precision curves

Parameters:
corpusDir - The directory of files to index.
queryFile - The file of query/relevant-docs pairs to evaluate.
outFile - File for output precision/recall data.
docType - The type of documents to index (See docType in DocumentIterator).
stem - Whether tokens should be stemmed with Porter stemmer.
Throws:
java.io.IOException

Experiment

public Experiment(InvertedIndex index,
                  java.io.File queryFile,
                  java.io.File outFile)
           throws java.io.IOException
Create an Experiment object for generating Recall/Precision curves using a provided InvertedIndex

Parameters:
index - an InvertedIndex object that contains an indexed document collection
queryFile - The file of query/relevant-docs pairs to evaluate.
outFile - File for output precision/recall data.
Throws:
java.io.IOException
Method Detail

makeRpCurve

public void makeRpCurve()
                 throws java.io.IOException
Process and evaluate all queries and generate recall-precision curve

Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Evaluate retrieval performance on a given query test corpus and generate a recall/precision graph. Command format: "Experiment [OPTION]* [DIR] [QUERIES] [OUTFILE]" where: DIR is the name of the directory whose files should be indexed. QUERIES is a file of queries paired with relevant docs (see queryFile). OUTFILE is the name of the file to put the output. The plot data for the recall precision curve is stored in this file and a gnuplot file for the graph is the same name with a ".gplot" extension. OPTIONs can be "-html" to specify HTML files whose HTML tags should be removed, and "-stem" to specify tokens should be stemmed with Porter stemmer.

Throws:
java.io.IOException