SafeHTMLPageRetriever

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

ir.webutils
Class SafeHTMLPageRetriever

java.lang.Object
  ir.webutils.HTMLPageRetriever
      ir.webutils.SafeHTMLPageRetriever

public final class SafeHTMLPageRetriever
extends HTMLPageRetriever
extends HTMLPageRetriever

Keeps track of Robot Exclusion information. Clients can use this class to ensure that they do not access pages prohibited either by the Robots Exclusion Protocol or Robots META tags.

Constructor Summary
`SafeHTMLPageRetriever()`

Method Summary
`HTMLPage`	`getHTMLPage(Link link)` Tries to download the given web page.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

SafeHTMLPageRetriever

public SafeHTMLPageRetriever()

Method Detail

getHTMLPage

public HTMLPage getHTMLPage(Link link)
                     throws PathDisallowedException

Tries to download the given web page. Throws PathDisallowedException if access to the page is prohibited. Also updates Robots Exclusion information based on the new page.

Overrides:: getHTMLPage in class HTMLPageRetriever

Parameters:: link - The Link to follow and download.
Returns:: The web page specified by the URL.
Throws:: PathDisallowedException - If url is disallowed by a robots.txt file or Robots META tag.

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

ir.webutils Class SafeHTMLPageRetriever

SafeHTMLPageRetriever

getHTMLPage

ir.webutils
Class SafeHTMLPageRetriever