com.sapportals.wcm.util.html

Interface IHTMLReader

All Known Subinterfaces:
IHTMLFilter
All Known Implementing Classes:
HTMLFilterImpl, HTMLScriptRemover

public interface IHTMLReader

Reads HTML documents and generates events.

The IHTMLReader generates events for HTML documents. Events are sent to the IHTMLContentHandler. There can be only one content handler per reader.

A document is parsed by first setting the input source and then calling parse() once or parseNextEvent() repeatedly. parseNextEvent() parses the document until the next event was sent to the content handler and then returns to the caller. It is not garantueed that exactly one event is generated.

Input Sources and Encodings:

Note that implementations of this class are not multithread-safe .

Copyright (c) SAP AG 2001-2002


Method Summary
 void discard()
          Free all allocated resources.
 IHTMLContentHandler getContentHandler()
          Get the registered content handler.
 String getEncoding()
          Return the encoding used in the document.
 ITextContentHandler getRawContentHandler()
          Get the registered raw content handler.
 void parse()
          Parse the complete document, generating events, until the source is read emtpy.
 boolean parseNextEvent()
          Parse the document, generating an events, and return to the caller.
 void setContentHandler(IHTMLContentHandler handler)
          Set the content handler to a new value.
 void setRawContentHandler(ITextContentHandler handler)
          Set the content handler to a new value.
 void setSource(InputStream input)
          Set InputStream as document source.
 void setSource(InputStream input, String encoding)
          Set InputStream as document source, use the given encoding.
 void setSource(Reader input)
          Set Reader as document source, encoding is irrelevant.
 

Method Detail

getContentHandler

IHTMLContentHandler getContentHandler()
Get the registered content handler. Returns null if none is installed.

Returns:
registered content handler

getRawContentHandler

ITextContentHandler getRawContentHandler()
Get the registered raw content handler. Returns null if none is installed.

Returns:
registered content handler

setContentHandler

void setContentHandler(IHTMLContentHandler handler)
Set the content handler to a new value. null is allowed to deregister an installed handler.

Parameters:
handler - to register

setRawContentHandler

void setRawContentHandler(ITextContentHandler handler)
Set the content handler to a new value. null is allowed to deregister an installed handler.

Parameters:
handler - to register

getEncoding

String getEncoding()
                   throws HTMLException,
                          IOException
Return the encoding used in the document.

Returns:
encoding used in document or null if unknown.
Throws:
HTMLException - when document is not legal HTML
IOException - on read errors

setSource

void setSource(InputStream input)
               throws HTMLException,
                      IOException
Set InputStream as document source. Encoding will be detected.

Parameters:
input - stream to read document from
Throws:
HTMLException - when document is not legal HTML
IOException - on read errors

setSource

void setSource(InputStream input,
               String encoding)
               throws HTMLException,
                      IOException
Set InputStream as document source, use the given encoding.

Parameters:
input - stream to read document from
encoding - to use for stream
Throws:
HTMLException - when document is not legal HTML
IOException - on read errors

setSource

void setSource(Reader input)
               throws HTMLException,
                      IOException
Set Reader as document source, encoding is irrelevant.

Parameters:
input - to read document from
Throws:
HTMLException - when document is not legal HTML
IOException - on read errors

parse

void parse()
           throws HTMLException,
                  IOException
Parse the complete document, generating events, until the source is read emtpy.

Throws:
HTMLException - when document is not legal HTML
IOException - on read errors

parseNextEvent

boolean parseNextEvent()
                       throws HTMLException,
                              IOException
Parse the document, generating an events, and return to the caller. Will return true as long as there are more events to read.

Returns:
if there are more events to read
Throws:
HTMLException - when document is not legal HTML
IOException - on read errors

discard

void discard()
Free all allocated resources. Not necessary to call when parsing has finished.

Access Rights

This class can be accessed from:


SC DC Public Part ACH
[sap.com] KMC-CM [sap.com] tc/km/frwk api EP-KM-CM
[sap.com] KMC-WPC [sap.com] tc/kmc/wpc/wpcfacade api EP-PIN-WPC-WCM


Copyright 2014 SAP AG Complete Copyright Notice