|
This Tech Tip reprinted with permission by java.sun.com
Most Java developers that work with XML are familiar with the
Simple
API for XML (SAX) and the Document Object Model (DOM) libraries. SAX is
an event-based API, which means that a programmer typically registers a
number of listeners with the parser, and when a specific XML grammar
construct is reached (for example, an element or an attribute), the
listener method is called. DOM, on the other hand, has a tree-based
architecture, that scans in the entire document and builds an
object-tree for each grammar construct it encounters. A programmer can
then access and modify the object tree after the scanning is complete.
Both of these approaches have their drawbacks: event-based
APIs that
make use of listeners are generally harder to work with. That's because
they're driven by the parser. Tree-based APIs can consume an inordinate
amount of memory in comparison to the size of the document being
scanned. However, now there is a third API available for Java
developers to scan XML: the Streaming API for XML parser, or StAX.
What is the SJSXP?
The Sun Java Streaming XML Parser is a high-speed implementation of
StAX. BEA Systems, working in conjunction with Sun Microsystems, Inc.,
as well as XML-guru James Clark, Stefan Haustein, and Aleksandr
Slominski (XmlPull developers), and others in the Java Community
Process developed StAX as an implementation of JSR 173.
StAX is a parser independent Java API based on a set of common
interfaces.
The SJSXP is included with version 1.5 of the Java
Web Services Developer Pack.
The first thing that you're likely to notice about SJSXP is that it is
based on a streaming API, which does not need to read an entire
document before a developer can access any of the nodes. It also does
not adhere to the principle of starting the parser and allowing the
parser to "push" data to the event listener methods. Instead, SJSXP
implements a "pull" method, where the parser maintains a pointer of
sorts to the currently-scanned location in the document--this is often
called a cursor. You simply ask the parser for the node that the cursor
currently points to.
Using SJSXP to Parse XML Documents
Reading in XML documents with the SJSXP is fairly easy. Most of the
work is done through an object that implements the javax.xml.stream.XMLStreamReader
interface. This interface represents a cursor that's moved across an
XML document from beginning to end. A few things to keep in mind: the
cursor always points to a single item, such as an element start-tag, a
processing instruction, or a DTD declaration. Also, the cursor always
moves forward (not backward), and you cannot perform any "look aheads"
to see what's upcoming in the document. You can obtain an XMLStreamReader
to read in XML from a file with the following snippet of code:
URL url = Class.forName("MyClassName").getResource(
"sample.xml");
InputStream in = url.openStream();
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
|
You can then iterate through the XML file with the following
code:
while(parser.hasNext()) {
eventType = parser.next();
switch (eventType) {
case START_ELEMENT:
// Do something
break;
case END_ELEMENT:
// Do something
break;
// And so on ...
}
}
|
The hasNext() method in XMLStreamReader
checks to see if there is another item available in the XML file. If
there is one, you can use the next() method
to advance the cursor to the next item. The next()
method returns an integer code that indicates the type of grammatical
construct (the item) that it encountered.
There are a number of get methods in XMLInputStreamReader
that you can use to obtain the contents of the XML item that the cursor
is pointing to. The first method is getEventType():
public int getEventType()
The method returns an integer code that identifies the type of item the
parser found under the cursor. It's the same code returned by the next()
method. The items are identified by one of the following XMLInputStream
constants:
XMLStreamConstants.START_DOCUMENT
XMLStreamConstants.END_DOCUMENT
XMLStreamConstants.START_ELEMENT
XMLStreamConstants.END_ELEMENT
XMLStreamConstants.ATTRIBUTE
XMLStreamConstants.CHARACTERS
XMLStreamConstants.CDATA
XMLStreamConstants.SPACE
XMLStreamConstants.COMMENT
XMLStreamConstants.DTD
XMLStreamConstants.START_ENTITY
XMLStreamConstants.END_ENTITY
XMLStreamConstants.ENTITY_DECLARATION
XMLStreamConstants.ENTITY_REFERENCE
XMLStreamConstants.NAMESPACE
XMLStreamConstants.NOTATION_DECLARATION
XMLStreamConstants.PROCESSING_INSTRUCTION
|
If the item has a name, you can use the getName()
and getLocalName()
methods to obtain the name. The latter yields the raw name, without any
extra information (for example, the name of the element without a
qualifying namespace).
public Qname getName()
public String getLocalName()
|
If you want to identify the namespace of the current item, you
can use
the getNamespaceURI() method:
public String getNamespaceURI()
|
If there is any accompanying text, such as the text in a DTD
declaration or text inside an element, you can use the following
methods to obtain them (the latter is used solely for elements):
public String getText()
public String getElementText()
|
If an element has attributes associated with it, you can use
the getAttributeCount()
method to obtain the number of attributes the current element has. You
can then retrieve information on each of them using the getAttributeName()
and getAttributeValue() methods:
public int getAttributeCount()
public Qname getAttributeName(int index)
public String getAttributeValue(int index)
|
If you know the local name of the attribute and the namespace
URI of
the element, you can also obtain the attribute value using the
following method:
public String getAttributeValue(
String elementNamespaceURI, String localAttributeName)
|
As you might have guessed, not all of the accessors methods
are
applicable in a specific state. For example, if you are currently
processing a DTD, you cannot call getElementText().
If you do so, you will either receive a XMLStreamException
stating that the parser has identified a conflicting event type, or the
method itself will return null.
You can turn on a number of parser properties by using the setProperty()
method of the XMLInputFactory class. For
example, the following specifies that entity references encountered by
the parser will be replaced:
factory.setProperty(
XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES,
Boolean.TRUE);
|
To prevent the parser from supporting external entities, use
the following setting:
factory.setProperty(
XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES,
Boolean.FALSE);
|
To make the parser namespace aware, use the following setting:
factory.setProperty(
XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.TRUE);
|
Note that the current version of SJSXP will accept the
following command, but the parser is currently non-validating.
factory.setProperty(XMLInputFactory.IS_VALIDATING,
Boolean.TRUE);
|
If any of these XMLInputFactory
properties
are enabled, you can use the setXMLReporter()
method to handle any errors faced by the parser. The easiest way to
determine exactly what type of error the parser encountered is to use
the following anonymous inner class in conjunction with the setXMLReporter()
method:
factory.setXMLReporter(new XMLReporter() {
public void report(String message, String errorType,
Object relatedInformation, Location location) {
System.err.println("Error in "
+ location.getLocationURI());
System.err.println("at line "
+ location.getLineNumber()
+ ", column " + location.getColumnNumber());
System.err.println(message);
}
});
|
Using SJSXP to Write XML Documents
Writing XML output is easy with SJSXP. In this case, you can use the XMLStreamWriter
interface instead of the XMLStreamReader
interface. The XMLStreamWriter
interface provides direct methods to write elements, attributes,
comments, text, and all the other parts of an XML document. The
following example shows how to obtain this interface and use it to
write an XML document:
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLStreamWriter xtw =
xof.createXMLStreamWriter(new FileWriter("myFile"));
xtw.writeComment(
"all elements here are in the HTML namespace");
xtw.writeStartDocument("utf-8","1.0");
xtw.setPrefix("html", "http://www.w3.org/TR/REC-html40");
xtw.writeStartElement(
"http://www.w3.org/TR/REC-html40","html");
xtw.writeNamespace(
"html", "http://www.w3.org/TR/REC-html40");
xtw.writeStartElement(
"http://www.w3.org/TR/REC-html40","head");
xtw.writeStartElement(
"http://www.w3.org/TR/REC-html40","title");
xtw.writeCharacters("Java Information");
xtw.writeEndElement();
xtw.writeEndElement();
xtw.writeStartElement(
"http://www.w3.org/TR/REC-html40","body");
xtw.writeStartElement("http://www.w3.org/TR/REC-html40","p");
xtw.writeCharacters("Java homepage is ");
xtw.writeStartElement("http://www.w3.org/TR/REC-html40","a");
xtw.writeAttribute("href","http://java.sun.com");
xtw.writeCharacters("here");
xtw.writeEndElement();
xtw.writeEndElement();
xtw.writeEndElement();
xtw.writeEndElement();
xtw.writeEndDocument();
xtw.flush();
xtw.close();
|
When you finish writing out each of the elements, you need to
flush and close the writer.
The preceding code will output the following XML (formatted here with
line breaks for easier reading):
<!--all elements here are explicitly in the HTML namespace--> <?xml version="1.0" encoding="utf-8"?> <html:html xmlns:html="http://www.w3.org/TR/REC-html40"> <html:head> <html:title>Java Information</html:title> </html:head> <html:body> <html:p> Java information is <html:a href="http://frob.com">here</html:a> </html:p> </html:body> </html:html>
Filtering XML Documents
You can create a filter for an incoming XML document if you
don't
want to scan through each item type. To do so, create a class that
implements the javax.xml.stream.StreamFilter
interface. This interface consists of only one method, accept(),
that accepts an XMLStreamReader object and
returns a primitive boolean. A typical implementation of StreamFilter
looks like the following:
public class MyStreamFilter implements StreamFilter {
public boolean accept(XMLStreamReader reader) {
if(!reader.isStartElement() && !reader.isEndElement())
return false;
else
return true;
}
}
|
You then create a filtered reader by calling the createFilteredReader()
method of the XMLInputFactory, and pass in
both the original XML stream reader and the StreamFilter
implementation. This is shown below:
factory.createFilteredReader(
factory.createXMLStreamReader(in), new MyStreamFilter());
|
For more information on the SJSXP, see the Sun
Java Streaming XML Parser release notes.
Running the Sample Code for the Sun Java Streaming
XML Parser
- Download the sample archive (ttfeb2005sjsxp.jar)
for this tech tip.
- Download and install Java
WSDP 1.5 from the Java Web Services Developer Pack Downloads
page.
- Change to the directory where you downloaded the sample
archive. Uncompress the JAR file for the sample archive as follows:
jar xvf ttfeb2005sjsxp.jar
- Set your
classpath to include the
ttfeb2005sjsxp.jar and jsr173_api.jar
files, which are located in the sjsxp/lib
directory of the Java WSDP 1.5 installation.
- Compile and run the
SJSXPInput
executable.
In response, you should see an entry similar to the following for each
XML item:
Event Type (Code=11): DTD Without a Name With Text: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1- transitional.dtd"> -----------------------------
- Compile and run the
SJSXPOutput
executable. The output will be sent to file named XMLOutputFile
and consist of the elements shown in the output example above.
Copyright (c) 2004-2005 Sun Microsystems, Inc.
All Rights Reserved.
Related Tips
|
You can share your information about this topic using the form below!
Please do not post your questions with this form! Thanks.