Class HTMLPage

  extended by ir.webutils.HTMLPage
Direct Known Subclasses:

public class HTMLPage
extends java.lang.Object

HTMLPage is a representation of information about a web page.

Field Summary
protected  Link link
          The original link to this page
protected  java.util.List<Link> outLinks
          The links on this page
protected  java.lang.String text
          The text of the page
Constructor Summary
HTMLPage(Link link, java.lang.String text)
          Constructs an HTMLPage with the given link and text.
Method Summary
protected static addEndSlash( url)
          If URL looks like a directory rather than a file, then add a "/" at the end so that it acts as a proper base URL for completing URLs in this page
 boolean empty()
          Returns true if the page is empty or a 404 error.
 Link getLink()
          Returns the Link object that was used to access this page.
 java.util.List<Link> getOutLinks()
          Get the list of out links from this page.
 java.lang.String getText()
          Returns the full text of this page.
 boolean indexAllowed()
          Clients should always call this method before indexing an HTML page if they want to obey the "NOINDEX" directive in the Robots META tag.
 void setOutLinks(java.util.List<Link> links)
          Set of the outLinks for this page to given list
 void write( dir, java.lang.String name)
          Writes web page to a file with a BASE HTML element with the original URL.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


protected final Link link
The original link to this page


protected final java.lang.String text
The text of the page


protected java.util.List<Link> outLinks
The links on this page

Constructor Detail


public HTMLPage(Link link,
                java.lang.String text)
Constructs an HTMLPage with the given link and text.

link - Link object to the given page.
text - The text of the page.
Method Detail


public java.lang.String getText()
Returns the full text of this page. None of the HTML is stripped out.

The text of the this page.


public Link getLink()
Returns the Link object that was used to access this page.

The Link object that was used to access this page.


public void setOutLinks(java.util.List<Link> links)
Set of the outLinks for this page to given list


public java.util.List<Link> getOutLinks()
Get the list of out links from this page.


public boolean indexAllowed()
Clients should always call this method before indexing an HTML page if they want to obey the "NOINDEX" directive in the Robots META tag. Always returns true in default implementation.

true iff. the page can be indexed. Always returns true in the default implementation.


public boolean empty()
Returns true if the page is empty or a 404 error.


public void write( dir,
                  java.lang.String name)
Writes web page to a file with a BASE HTML element with the original URL.

dir - The directory to store the file in.
name - The name of the file.


protected static addEndSlash( url)
If URL looks like a directory rather than a file, then add a "/" at the end so that it acts as a proper base URL for completing URLs in this page