ir.webutils
Class SafeHTMLPage

java.lang.Object
  extended by ir.webutils.HTMLPage
      extended by ir.webutils.SafeHTMLPage

public final class SafeHTMLPage
extends HTMLPage

SafeHTMLPage is an immutable representation of information about a web page that includes information about whether or not this page can be indexed. This class is intended to be used in conjunction with SafeHTMLPageRetriever to allow clients to facilitate writing spiders that obey both the Robots Exclusion Protocol and the Robots META tags.


Field Summary
 
Fields inherited from class ir.webutils.HTMLPage
link, outLinks, text
 
Constructor Summary
SafeHTMLPage(Link link, java.lang.String text, boolean index)
          Constructs an SafeHTMLPage with the given link, text, and indication whether or not indexing is allowed.
 
Method Summary
 boolean indexAllowed()
          Indicates whether or not indexing has been disallowed by a Robots META tag.
 
Methods inherited from class ir.webutils.HTMLPage
addEndSlash, empty, getLink, getOutLinks, getText, setOutLinks, write
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SafeHTMLPage

public SafeHTMLPage(Link link,
                    java.lang.String text,
                    boolean index)
Constructs an SafeHTMLPage with the given link, text, and indication whether or not indexing is allowed.

Parameters:
link - A Link object representing the given page.
text - The text of the page.
index - Should be true iff. the page can be indexed.
Method Detail

indexAllowed

public boolean indexAllowed()
Indicates whether or not indexing has been disallowed by a Robots META tag. Clients should always call this method before indexing an HTML page if they want to obey the "NOINDEX" directive in the Robots META tag. Clients should also make sure to employ an SafeHTMLPageRetriever that supports Robots META tags, such as SafeHTMLPageRetriever.

Overrides:
indexAllowed in class HTMLPage
Returns:
true iff. the page can be indexed.