ir.webutils
Class BeamSearchSiteSpider

java.lang.Object
  extended by ir.webutils.Spider
      extended by ir.webutils.BeamSearchSpider
          extended by ir.webutils.BeamSearchSiteSpider

public class BeamSearchSiteSpider
extends BeamSearchSpider

A BeamSearchSpider that limits itself to a given site (web host).


Field Summary
 
Fields inherited from class ir.webutils.BeamSearchSpider
beamSize, goal, goalPage, heuristic
 
Fields inherited from class ir.webutils.Spider
count, linksToVisit, maxCount, retriever, saveDir, slow, visited
 
Constructor Summary
BeamSearchSiteSpider()
           
 
Method Summary
 java.util.List<Link> getNewLinks(HTMLPage page)
          Gets links from the given page that are on the same host as the page.
static void main(java.lang.String[] args)
          Search the web using beam search according to the following command options, but stay within the initial host site.
 
Methods inherited from class ir.webutils.BeamSearchSpider
constructLinkHeuristic, doCrawl, go, handleBCommandLineOption, handleHCommandLineOption, handleUCommandLineOption, handleWCommandLineOption, processArgs, scoreLinks
 
Methods inherited from class ir.webutils.Spider
handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, indexPage, linkToHTMLPage
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BeamSearchSiteSpider

public BeamSearchSiteSpider()
Method Detail

getNewLinks

public java.util.List<Link> getNewLinks(HTMLPage page)
Gets links from the given page that are on the same host as the page.

Overrides:
getNewLinks in class BeamSearchSpider
Parameters:
page - The current page.
Returns:
A list of links on page that have the same host as url.

main

public static void main(java.lang.String[] args)
Search the web using beam search according to the following command options, but stay within the initial host site.