ir.webutils
Class SiteSpider

java.lang.Object
  extended by ir.webutils.Spider
      extended by ir.webutils.SiteSpider

public class SiteSpider
extends Spider

A spider that limits itself to a given site.


Field Summary
 
Fields inherited from class ir.webutils.Spider
count, linksToVisit, maxCount, retriever, saveDir, slow, visited
 
Constructor Summary
SiteSpider()
           
 
Method Summary
 java.util.List<Link> getNewLinks(HTMLPage page)
          Gets links from the given page that are on the same host as the page.
static void main(java.lang.String[] args)
          Spider the web according to the following command options, but stay within the given site (same URL host).
 
Methods inherited from class ir.webutils.Spider
doCrawl, go, handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, handleUCommandLineOption, indexPage, linkToHTMLPage, processArgs
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SiteSpider

public SiteSpider()
Method Detail

getNewLinks

public java.util.List<Link> getNewLinks(HTMLPage page)
Gets links from the given page that are on the same host as the page.

Overrides:
getNewLinks in class Spider
Parameters:
page - The current page.
Returns:
A list of links on page that have the same host as url.

main

public static void main(java.lang.String[] args)
Spider the web according to the following command options, but stay within the given site (same URL host).