See: Description
| Class | Description |
|---|---|
| DirectorySpider |
Spider that limits itself to the directory it started in.
|
| Graph |
Graph data structure.
|
| HTMLPage |
HTMLPage is a representation of information about a web
page.
|
| HTMLPageRetriever |
HTMLPageRetriever allows clients to download web pages from URLs.
|
| HTMLParserMaker |
HTMLParserMaker allows clients to retrieve an
HTMLEditorKit.Parser instance.
|
| Link |
Link is a class that contains a URL.
|
| LinkExtractor |
LinkExtractor defines a callback that extracts the links from an
HTML document and provides functionality to parse a document.
|
| Node |
Node in the the Graph data structure.
|
| RobotExclusionSet |
RobotExclusionSet provides support for the Robots Exclusion
Protocol.
|
| RobotsMetaTagParser |
Parser callback that extracts robots META tag information.
|
| SafeHTMLPage |
SafeHTMLPage is an immutable representation of information about a
web page that includes information about whether or not this page
can be indexed.
|
| SafeHTMLPageRetriever |
Keeps track of Robot Exclusion information.
|
| SiteSpider |
A spider that limits itself to a given site.
|
| Spider |
Spider defines a framework for writing a web crawler.
|
| StringSearchResult |
Lightweight object for storing both the number of DIFFERENT strings
in a set of search strings that are found in a text as well as the total number
of occurrences in the text of ANY of the strings in the set.
|
| URLChecker |
URLChecker tries to clean up some URLs that do not conform to the standard and cause confusion.
|
| WebPage |
WebPage is a static utility class that provides operations for
downloading web pages.
|
| WebPageViewer |
WebPageViewer contains utilities to download and display HTML
pages.
|
| YahooCategoryLinkExtractor |
YahooCategoryLinkExtractor defines a callback for the Swing HTML parser
that extracts links to subcategories from a Yahoo directory page.
|
| YahooSiteLinkExtractor |
YahooSiteLinkExtractor defines a callback that extracts site links from a
Yahoo directory page and provides functionality to parse a document.
|
| YahooSpider |
Specific spider for extracting and saving a particular number of random set of
pages for a particular topic category in the Yahoo directory.
|
| Exception | Description |
|---|---|
| PathDisallowedException |
PathDisallowedException is thrown to indicate that a client program tried
to access a path that was disallowed by either a robots.txt file or a robots META tag.
|
For command line interfaces see the main methods of the following classes: