ir.webutils
Class RobotExclusionSet

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractSet<java.lang.String>
          extended by ir.webutils.RobotExclusionSet
All Implemented Interfaces:
java.lang.Iterable<java.lang.String>, java.util.Collection<java.lang.String>, java.util.Set<java.lang.String>

public class RobotExclusionSet
extends java.util.AbstractSet<java.lang.String>

RobotExclusionSet provides support for the Robots Exclusion Protocol. This class provides the ability to parse a robots.txt file and to check files to make sure that access to them has not been disallowed by the robots.txt file. This class can also be used to exclude files linked to on a page that specifies NOFOLLOW in its Robots META tag.


Constructor Summary
RobotExclusionSet()
          Constructs an empty set.
RobotExclusionSet(java.lang.String site)
          Constructs a set containing the paths in the robots.txt file for this site.
 
Method Summary
 boolean add(java.lang.String o)
           
 boolean contains(java.lang.String path)
          Checks to see if a path is prohibited by this set.
 java.util.Iterator<java.lang.String> iterator()
           
static void main(java.lang.String[] args)
          For testing only.
 int size()
           
 
Methods inherited from class java.util.AbstractSet
equals, hashCode, removeAll
 
Methods inherited from class java.util.AbstractCollection
addAll, clear, contains, containsAll, isEmpty, remove, retainAll, toArray, toArray, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.Set
addAll, clear, contains, containsAll, isEmpty, remove, retainAll, toArray, toArray
 

Constructor Detail

RobotExclusionSet

public RobotExclusionSet()
Constructs an empty set.


RobotExclusionSet

public RobotExclusionSet(java.lang.String site)
Constructs a set containing the paths in the robots.txt file for this site. The robots.txt file should conform to the Robots Exclusion Protocol specification, available at http://www.robotstxt.org/wc/norobots.htmquerycount.

Parameters:
site - The name of the site
Method Detail

size

public int size()
Specified by:
size in interface java.util.Collection<java.lang.String>
Specified by:
size in interface java.util.Set<java.lang.String>
Specified by:
size in class java.util.AbstractCollection<java.lang.String>

add

public boolean add(java.lang.String o)
Specified by:
add in interface java.util.Collection<java.lang.String>
Specified by:
add in interface java.util.Set<java.lang.String>
Overrides:
add in class java.util.AbstractCollection<java.lang.String>

iterator

public java.util.Iterator<java.lang.String> iterator()
Specified by:
iterator in interface java.lang.Iterable<java.lang.String>
Specified by:
iterator in interface java.util.Collection<java.lang.String>
Specified by:
iterator in interface java.util.Set<java.lang.String>
Specified by:
iterator in class java.util.AbstractCollection<java.lang.String>

contains

public boolean contains(java.lang.String path)
Checks to see if a path is prohibited by this set. A path is prohibited if it starts with an entry in this set.

Parameters:
path - String object representing the path.
Returns:
true iff. o is a String object, o is not null, and for each element e in this set !o.startsWith(e).

main

public static void main(java.lang.String[] args)
For testing only. Parses robosts.txt file for a particular site