ir.webutils
Class LinkHeuristic

java.lang.Object
  extended by ir.webutils.LinkHeuristic

public class LinkHeuristic
extends java.lang.Object

Evaluates a web link (ScoredAnchoredLink) based on satisfying a set of "want strings" and "help strings".

The existing search heuristic considers four factors in order of importance:

  1. (WC) The number of the want-strings that are found
  2. (WT) The total number of times a want-string is found
  3. (HC) The number of the help-strings that are found
  4. (HT) The total number of times a help-string is found
A page is scored as

S = 1000*WC + 100*HC + 10*WT + HT

A link is scored partly based on the text appearing directly in the link and partly based on the surrounding page. If L is the S score for the text in the link and P is the S score for the overall page, then a link is scored as

L/2 + P/2

getting half its score from it's own text and half from its surrounding page.


Field Summary
 java.lang.String[] helpStrings
          The array of help strings to help find the want strings
 java.lang.String[] wantStrings
          The array of want strings that are desired
 
Constructor Summary
LinkHeuristic()
          Construct an empty heuristic
LinkHeuristic(java.lang.String[] wantStrings, java.lang.String[] helpStrings)
          Construct a heuristic with the given wantStrings and helpStrings
 
Method Summary
 double scoreLink(ScoredAnchoredLink link, HTMLPage page)
          Heuristically score the given link appearing on the given page
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

wantStrings

public java.lang.String[] wantStrings
The array of want strings that are desired


helpStrings

public java.lang.String[] helpStrings
The array of help strings to help find the want strings

Constructor Detail

LinkHeuristic

public LinkHeuristic()
Construct an empty heuristic


LinkHeuristic

public LinkHeuristic(java.lang.String[] wantStrings,
                     java.lang.String[] helpStrings)
Construct a heuristic with the given wantStrings and helpStrings

Method Detail

scoreLink

public double scoreLink(ScoredAnchoredLink link,
                        HTMLPage page)
Heuristically score the given link appearing on the given page