SafeHTMLPageRetriever

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- ir.webutils.HTMLPageRetriever
- - ir.webutils.SafeHTMLPageRetriever

```
public final class SafeHTMLPageRetriever
extends HTMLPageRetriever
```
Keeps track of Robot Exclusion information. Clients can use this class to ensure that they do not access pages prohibited either by the Robots Exclusion Protocol or Robots META tags.

- Constructor Summary
  
  Constructors
  Constructor and Description
  
  SafeHTMLPageRetriever()
- Method Summary
  
  Methods
  Modifier and Type Method and Description
  
  HTMLPage getHTMLPage(Link link)
  Tries to download the given web page.
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - SafeHTMLPageRetriever
```
public SafeHTMLPageRetriever()
```
- Method Detail
  - getHTMLPage
```
public HTMLPage getHTMLPage(Link link)
                     throws PathDisallowedException
```
    Tries to download the given web page. Throws PathDisallowedException if access to the page is prohibited. Also updates Robots Exclusion information based on the new page.
    
    Overrides:
    
    getHTMLPage in class HTMLPageRetriever
    
    Parameters:
    link - The Link to follow and download.
    
    Returns:
    The web page specified by the URL.
    
    Throws:
    
    PathDisallowedException - If url is disallowed by a robots.txt file or Robots META tag.

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method