Version of Experiment for queries that have continuously rated
gold-standard document relevance judgements and includes evaluation
with NDCG. Computes and reports NDCG values at all ranks up to a
specified limit (NDCGlimit)
Assumes the format of the results for a query in the queries file is a
list of pairs of document names followed by a relevance score between 0 and 1
public java.util.Map<java.lang.String,java.lang.Double> ratingsMap
HashMap that stores the mapping of document names to their gold-standard relevance ratings
public static int NDCGlimit
The maximum N for computing NDCG @ N
public double NDCGvalues
Current sum of NDCG values @ all levels up to NDCGlimit
Updated when processing each query.
Eventually used to compute average NDCG across all queries
public ExperimentRated(java.io.File corpusDir,
Constructor that just calls the Experiment constructor
public static void main(java.lang.String args)
Evaluate retrieval performance on a given query test corpus and
generate a recall/precision graph and table of NDCG results.
Command format: "Experiment [OPTION]* [DIR] [QUERIES] [OUTFILE]" where:
DIR is the name of the directory whose files should be indexed.
QUERIES is a file of queries paired with relevant docs
and continuous gold-standard relevance ratings (see queryFile).
OUTFILE is the name of the file to put the output. The plot
data for the recall precision curve is stored in this file and a
gnuplot file for the graph is the same name with a ".gplot" extension
and a NDCG result file is the same name with a ".ndcg" extension
OPTIONs can be
"-html" to specify HTML files whose HTML tags should be removed, and
"-stem" to specify tokens should be stemmed with Porter stemmer.