ir.webutils
Class Link

java.lang.Object
  extended by ir.webutils.Link

public class Link
extends java.lang.Object

Link is a class that contains a URL. Subclasses of link may keep additional information (such as anchor text & other attributes)


Constructor Summary
protected Link()
          May be subclassed.
  Link(java.lang.String urlName)
          Construct a link with specified URL string
  Link(java.net.URL url)
          Constructs a link with specified URL.
 
Method Summary
static java.net.URL cleanURL(java.net.URL url)
          Standardize URL by removing trailing slashes, URL decoding it, replacing the UTCS-specific "/users/user" to "/~user" link, and removing a set of common index pages.
 boolean equals(java.lang.Object o)
           
 java.net.URL getURL()
          Returns the URL of this link.
 int hashCode()
           
static void main(java.lang.String[] args)
           
static java.net.URL removeEndSlash(java.net.URL url)
          Removes slash at end of URL to normalize
static java.net.URL removeRef(java.net.URL url)
          Remove the internal "ref" pointer in a URL if there is one.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Link

protected Link()
May be subclassed. This constructor should not be invoked by clients of Link.


Link

public Link(java.net.URL url)
Constructs a link with specified URL.

Parameters:
url - The URL for this link.

Link

public Link(java.lang.String urlName)
Construct a link with specified URL string

Method Detail

getURL

public final java.net.URL getURL()
Returns the URL of this link.

Returns:
The URL of this link.

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

equals

public boolean equals(java.lang.Object o)
Overrides:
equals in class java.lang.Object

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

cleanURL

public static java.net.URL cleanURL(java.net.URL url)
Standardize URL by removing trailing slashes, URL decoding it, replacing the UTCS-specific "/users/user" to "/~user" link, and removing a set of common index pages. This code isn't robust enough for the general web, but makes this spider work more nicely on toy examples.

Parameters:
url - The unnormalized URL
Returns:
a cleaned, normalized URL as described above

removeEndSlash

public static java.net.URL removeEndSlash(java.net.URL url)
Removes slash at end of URL to normalize


removeRef

public static java.net.URL removeRef(java.net.URL url)
Remove the internal "ref" pointer in a URL if there is one. This not part of the URL to a page itself


main

public static void main(java.lang.String[] args)