Docment is an abstract class that provides for tokenization of a document with stop-word removal and an iterator-like interface similar to StringTokenizer.
An object for iterating over a set of documents in a directory.
A simple data structure for storing a reference to a document file that includes information on the length of its document vector.
Gets and stores information about relevance feedback from the user and computes an updated query based on original query and retrieved documents that are rated relevant and irrelevant.
A Document stored as a file.
A data structure for a term vector for a document stored as a HashMap that maps tokens to Weight's that store the weight of that token in the document.
An HTML file document where HTML commands are removed from the token stream.
An inverted index for vector-space information retrieval.
A lightweight object for storing information about a retrieved Document.
A normal ASCII text file Document
A simple document represented by a String
A lightweight object for storing information about a token (a.k.a word, term) in an inverted index.
A lightweight object for storing information about an occurrence of a token (a.k.a word, term) in a Document.
For command line interfaces see the main methods of the following classes: