|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectir.webutils.Spider
ir.webutils.SiteSpider
public class SiteSpider
A spider that limits itself to a given site.
| Field Summary |
|---|
| Fields inherited from class ir.webutils.Spider |
|---|
count, linksToVisit, maxCount, retriever, saveDir, slow, visited |
| Constructor Summary | |
|---|---|
SiteSpider()
|
|
| Method Summary | |
|---|---|
java.util.List<Link> |
getNewLinks(HTMLPage page)
Gets links from the given page that are on the same host as the page. |
static void |
main(java.lang.String[] args)
Spider the web according to the following command options, but stay within the given site (same URL host). |
| Methods inherited from class ir.webutils.Spider |
|---|
doCrawl, go, handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, handleUCommandLineOption, indexPage, linkToHTMLPage, processArgs |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public SiteSpider()
| Method Detail |
|---|
public java.util.List<Link> getNewLinks(HTMLPage page)
getNewLinks in class Spiderpage - The current page.
page that have the same
host as url.public static void main(java.lang.String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||