|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectir.webutils.HTMLPageRetriever
ir.webutils.SafeHTMLPageRetriever
public final class SafeHTMLPageRetriever
Keeps track of Robot Exclusion information. Clients can use this class to ensure that they do not access pages prohibited either by the Robots Exclusion Protocol or Robots META tags.
| Constructor Summary | |
|---|---|
SafeHTMLPageRetriever()
|
|
| Method Summary | |
|---|---|
HTMLPage |
getHTMLPage(Link link)
Tries to download the given web page. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public SafeHTMLPageRetriever()
| Method Detail |
|---|
public HTMLPage getHTMLPage(Link link)
throws PathDisallowedException
PathDisallowedException if access to the page is
prohibited. Also updates Robots Exclusion information based on
the new page.
getHTMLPage in class HTMLPageRetrieverlink - The Link to follow and download.
PathDisallowedException - If url is
disallowed by a robots.txt file or Robots META tag.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||