ir.webutils
Class RobotsMetaTagParser

java.lang.Object
  extended by javax.swing.text.html.HTMLEditorKit.ParserCallback
      extended by ir.webutils.RobotsMetaTagParser

public final class RobotsMetaTagParser
extends javax.swing.text.html.HTMLEditorKit.ParserCallback

Parser callback that extracts robots META tag information.


Field Summary
 
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
IMPLIED
 
Constructor Summary
RobotsMetaTagParser()
           
RobotsMetaTagParser(java.net.URL url)
           
RobotsMetaTagParser(java.net.URL url, java.lang.String page)
           
 
Method Summary
 void handleSimpleTag(javax.swing.text.html.HTML.Tag tag, javax.swing.text.MutableAttributeSet attributes, int position)
          Checks for robots META tags.
 boolean index()
          Indicates whether the page can be indexed.
 java.util.List<Link> parseMetaTags()
          Parses the document and returns a list of links that can not be followed.
 void setPage(java.lang.String page)
           
 void setUrl(java.net.URL url)
           
 
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
flush, handleComment, handleEndOfLineString, handleEndTag, handleError, handleStartTag, handleText
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RobotsMetaTagParser

public RobotsMetaTagParser()

RobotsMetaTagParser

public RobotsMetaTagParser(java.net.URL url)

RobotsMetaTagParser

public RobotsMetaTagParser(java.net.URL url,
                           java.lang.String page)
Method Detail

setPage

public void setPage(java.lang.String page)

setUrl

public void setUrl(java.net.URL url)

handleSimpleTag

public void handleSimpleTag(javax.swing.text.html.HTML.Tag tag,
                            javax.swing.text.MutableAttributeSet attributes,
                            int position)
Checks for robots META tags. If a robots META tag is found, then the content (if any) is extracted and stored. Note that only the last robots META tag will be considered.

Overrides:
handleSimpleTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback
Parameters:
tag - Indicates the type of tag that caused this method to be called. Only META tags are handled, any other kind of tag causes this method to do nothing.
attributes - The attributes of this tag. If the tag defines the "name" attribute with value "robots" (not case sensitive) then the "content" attribute will be checked, and stored if it exists.
position - The position of the tag in the document. Not used.

parseMetaTags

public java.util.List<Link> parseMetaTags()
Parses the document and returns a list of links that can not be followed. This method also sets a flag that indicates whether or not this page can be indexed. Clients can then use index to check the value of this flag.

Returns:
A List of Links that should not be followed from this page.

index

public boolean index()
Indicates whether the page can be indexed. Call this method only after parseMetaTags has been called.

Returns:
true iff. the page can be indexed.