HTMLPage

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

ir.webutils
Class HTMLPage

java.lang.Object
  ir.webutils.HTMLPage

Direct Known Subclasses:: SafeHTMLPage

public class HTMLPage
extends java.lang.Object
extends java.lang.Object

HTMLPage is a representation of information about a web page.

Field Summary
`protected Link`	`link` The original link to this page
`protected java.util.List<Link>`	`outLinks` The links on this page
`protected java.lang.String`	`text` The text of the page

Constructor Summary
`HTMLPage(Link link, java.lang.String text)` Constructs an `HTMLPage` with the given link and text.

Method Summary
`protected static java.net.URL`	`addEndSlash(java.net.URL url)` If URL looks like a directory rather than a file, then add a "/" at the end so that it acts as a proper base URL for completing URLs in this page
`boolean`	`empty()` Returns true if the page is empty or a 404 error.
`Link`	`getLink()` Returns the `Link` object that was used to access this page.
`java.util.List<Link>`	`getOutLinks()` Get the list of out links from this page.
`java.lang.String`	`getText()` Returns the full text of this page.
`boolean`	`indexAllowed()` Clients should always call this method before indexing an HTML page if they want to obey the "NOINDEX" directive in the Robots META tag.
`void`	`setOutLinks(java.util.List<Link> links)` Set of the outLinks for this page to given list
`void`	`write(java.io.File dir, java.lang.String name)` Writes web page to a file with a BASE HTML element with the original URL.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

link

protected final Link link

The original link to this page

text

protected final java.lang.String text

The text of the page

outLinks

protected java.util.List<Link> outLinks

The links on this page

Constructor Detail

HTMLPage

public HTMLPage(Link link,
                java.lang.String text)

Constructs an HTMLPage with the given link and text.

Parameters:: link - Link object to the given page.; text - The text of the page.

Method Detail

getText

public java.lang.String getText()

Returns the full text of this page. None of the HTML is stripped out.

Returns:: The text of the this page.

getLink

public Link getLink()

Returns the Link object that was used to access this page.

Returns:: The Link object that was used to access this page.

setOutLinks

public void setOutLinks(java.util.List<Link> links)

Set of the outLinks for this page to given list

getOutLinks

public java.util.List<Link> getOutLinks()

Get the list of out links from this page.

indexAllowed

public boolean indexAllowed()

Clients should always call this method before indexing an HTML page if they want to obey the "NOINDEX" directive in the Robots META tag. Always returns true in default implementation.

Returns:: true iff. the page can be indexed. Always returns true in the default implementation.

empty

public boolean empty()

Returns true if the page is empty or a 404 error.

write

public void write(java.io.File dir,
                  java.lang.String name)

Writes web page to a file with a BASE HTML element with the original URL.

Parameters:: dir - The directory to store the file in.; name - The name of the file.

addEndSlash

protected static java.net.URL addEndSlash(java.net.URL url)

If URL looks like a directory rather than a file, then add a "/" at the end so that it acts as a proper base URL for completing URLs in this page

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

ir.webutils Class HTMLPage

link

text

outLinks

HTMLPage

getText

getLink

setOutLinks

getOutLinks

indexAllowed

empty

write

addEndSlash

ir.webutils
Class HTMLPage