ir.webutils
Class DirectorySpider

java.lang.Object
  extended by ir.webutils.Spider
      extended by ir.webutils.DirectorySpider

public class DirectorySpider
extends Spider

Spider that limits itself to the directory it started in.


Field Summary
 
Fields inherited from class ir.webutils.Spider
count, linksToVisit, maxCount, retriever, saveDir, slow, visited
 
Constructor Summary
DirectorySpider()
           
 
Method Summary
 java.util.List<Link> getNewLinks(HTMLPage page)
          Gets links from the page that are in or below the starting directory.
protected  void handleUCommandLineOption(java.lang.String value)
          Sets the initial URL from the "-u" argument, then calls the corresponding superclass method.
static void main(java.lang.String[] args)
          Spider the web according to the following command options, but only below the start URL directory.
 
Methods inherited from class ir.webutils.Spider
doCrawl, go, handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, indexPage, linkToHTMLPage, processArgs
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DirectorySpider

public DirectorySpider()
Method Detail

getNewLinks

public java.util.List<Link> getNewLinks(HTMLPage page)
Gets links from the page that are in or below the starting directory.

Overrides:
getNewLinks in class Spider
Parameters:
page - The current page.
Returns:
The links on page that are in or below the directory of the first page.

handleUCommandLineOption

protected void handleUCommandLineOption(java.lang.String value)
Sets the initial URL from the "-u" argument, then calls the corresponding superclass method.

Overrides:
handleUCommandLineOption in class Spider
Parameters:
value - The value of the "-u" command line argument.

main

public static void main(java.lang.String[] args)
Spider the web according to the following command options, but only below the start URL directory.