Spider that limits itself to the directory it started in.
Graph data structure.
HTMLPage is a representation of information about a web page.
HTMLPageRetriever allows clients to download web pages from URLs.
HTMLParserMaker allows clients to retrieve an HTMLEditorKit.Parser instance.
Link is a class that contains a URL.
LinkExtractor defines a callback that extracts the links from an HTML document and provides functionality to parse a document.
Node in the the Graph data structure.
RobotExclusionSet provides support for the Robots Exclusion Protocol.
Parser callback that extracts robots META tag information.
SafeHTMLPage is an immutable representation of information about a web page that includes information about whether or not this page can be indexed.
Keeps track of Robot Exclusion information.
A spider that limits itself to a given site.
Spider defines a framework for writing a web crawler.
Lightweight object for storing both the number of DIFFERENT strings in a set of search strings that are found in a text as well as the total number of occurrences in the text of ANY of the strings in the set.
URLChecker tries to clean up some URLs that do not conform to the standard and cause confusion.
WebPage is a static utility class that provides operations for downloading web pages.
WebPageViewer contains utilities to download and display HTML pages.
YahooCategoryLinkExtractor defines a callback for the Swing HTML parser that extracts links to subcategories from a Yahoo directory page.
YahooSiteLinkExtractor defines a callback that extracts site links from a Yahoo directory page and provides functionality to parse a document.
Specific spider for extracting and saving a particular number of random set of pages for a particular topic category in the Yahoo directory.
PathDisallowedException is thrown to indicate that a client program tried to access a path that was disallowed by either a robots.txt file or a robots META tag.
For command line interfaces see the main methods of the following classes: