java left logo
java middle logo
java right logo
 

Home arrow Java EE Tips arrow Enterprise Java Beans arrow Introducing the Sun Java Streaming XML Parser
 
 
Main Menu
Home
Java Tutorials
Book Reviews
Java SE Tips
Java ME Tips
Java EE Tips
Other API Tips
Java Applications
Java Libraries
Java Games
Java Network
Java Forums
Java Blog




Most Visited Tips
Java SE Tips
Java ME Tips
Java EE Tips
Other API Tips
Java Applications
Java Libraries
Java Games
Book Reviews
Top Rated Tips
Java SE Tips
Java ME Tips
Java EE Tips
Other API Tips
Java Applications
Java Libraries
Java Games
Book Reviews


Statistics
Registered Users: 4091
Java SE Tips: 614
Java ME Tips: 202
Java EE Tips: 183
Other API Tips: 779
Java Applications: 298
Java Libraries: 209
Java Games: 16
Book Reviews:
 
 
 
Introducing the Sun Java Streaming XML Parser E-mail
User Rating: / 23
PoorBest 

This Tech Tip reprinted with permission by java.sun.com

Most Java developers that work with XML are familiar with the Simple API for XML (SAX) and the Document Object Model (DOM) libraries. SAX is an event-based API, which means that a programmer typically registers a number of listeners with the parser, and when a specific XML grammar construct is reached (for example, an element or an attribute), the listener method is called. DOM, on the other hand, has a tree-based architecture, that scans in the entire document and builds an object-tree for each grammar construct it encounters. A programmer can then access and modify the object tree after the scanning is complete.

Both of these approaches have their drawbacks: event-based APIs that make use of listeners are generally harder to work with. That's because they're driven by the parser. Tree-based APIs can consume an inordinate amount of memory in comparison to the size of the document being scanned. However, now there is a third API available for Java developers to scan XML: the Streaming API for XML parser, or StAX.

What is the SJSXP?

The Sun Java Streaming XML Parser is a high-speed implementation of StAX. BEA Systems, working in conjunction with Sun Microsystems, Inc., as well as XML-guru James Clark, Stefan Haustein, and Aleksandr Slominski (XmlPull developers), and others in the Java Community Process developed StAX as an implementation of JSR 173. StAX is a parser independent Java API based on a set of common interfaces.

The SJSXP is included with version 1.5 of the Java Web Services Developer Pack. The first thing that you're likely to notice about SJSXP is that it is based on a streaming API, which does not need to read an entire document before a developer can access any of the nodes. It also does not adhere to the principle of starting the parser and allowing the parser to "push" data to the event listener methods. Instead, SJSXP implements a "pull" method, where the parser maintains a pointer of sorts to the currently-scanned location in the document--this is often called a cursor. You simply ask the parser for the node that the cursor currently points to.

Using SJSXP to Parse XML Documents

Reading in XML documents with the SJSXP is fairly easy. Most of the work is done through an object that implements the javax.xml.stream.XMLStreamReader interface. This interface represents a cursor that's moved across an XML document from beginning to end. A few things to keep in mind: the cursor always points to a single item, such as an element start-tag, a processing instruction, or a DTD declaration. Also, the cursor always moves forward (not backward), and you cannot perform any "look aheads" to see what's upcoming in the document. You can obtain an XMLStreamReader to read in XML from a file with the following snippet of code:

   URL url = Class.forName("MyClassName").getResource(
           "sample.xml");            
   InputStream in = url.openStream();
   XMLInputFactory factory = XMLInputFactory.newInstance();
   XMLStreamReader parser = factory.createXMLStreamReader(in);

You can then iterate through the XML file with the following code:
   while(parser.hasNext()) {
             
         eventType = parser.next();
         switch (eventType) {

              case START_ELEMENT:
              //  Do something
              break;
              case END_ELEMENT:
              //  Do something
              break;
              //  And so on ...
         }
     }

The hasNext() method in XMLStreamReader checks to see if there is another item available in the XML file. If there is one, you can use the next() method to advance the cursor to the next item. The next() method returns an integer code that indicates the type of grammatical construct (the item) that it encountered.

There are a number of get methods in XMLInputStreamReader that you can use to obtain the contents of the XML item that the cursor is pointing to. The first method is getEventType():

 public int getEventType()

The method returns an integer code that identifies the type of item the parser found under the cursor. It's the same code returned by the next() method. The items are identified by one of the following XMLInputStream constants:

  XMLStreamConstants.START_DOCUMENT 
  XMLStreamConstants.END_DOCUMENT 
  XMLStreamConstants.START_ELEMENT 
  XMLStreamConstants.END_ELEMENT 
  XMLStreamConstants.ATTRIBUTE 
  XMLStreamConstants.CHARACTERS 
  XMLStreamConstants.CDATA 
  XMLStreamConstants.SPACE 
  XMLStreamConstants.COMMENT 
  XMLStreamConstants.DTD 
  XMLStreamConstants.START_ENTITY 
  XMLStreamConstants.END_ENTITY 
  XMLStreamConstants.ENTITY_DECLARATION 
  XMLStreamConstants.ENTITY_REFERENCE 
  XMLStreamConstants.NAMESPACE 
  XMLStreamConstants.NOTATION_DECLARATION 
  XMLStreamConstants.PROCESSING_INSTRUCTION 

If the item has a name, you can use the getName() and getLocalName() methods to obtain the name. The latter yields the raw name, without any extra information (for example, the name of the element without a qualifying namespace).

   public Qname getName()
   public String getLocalName()

If you want to identify the namespace of the current item, you can use the getNamespaceURI() method:

   public String getNamespaceURI()

If there is any accompanying text, such as the text in a DTD declaration or text inside an element, you can use the following methods to obtain them (the latter is used solely for elements):

   public String getText()
   public String getElementText()

If an element has attributes associated with it, you can use the getAttributeCount() method to obtain the number of attributes the current element has. You can then retrieve information on each of them using the getAttributeName() and getAttributeValue() methods:

   public int getAttributeCount()
   public Qname getAttributeName(int index)
   public String getAttributeValue(int index)

If you know the local name of the attribute and the namespace URI of the element, you can also obtain the attribute value using the following method:

   public String getAttributeValue(
     String elementNamespaceURI, String localAttributeName

As you might have guessed, not all of the accessors methods are applicable in a specific state. For example, if you are currently processing a DTD, you cannot call getElementText(). If you do so, you will either receive a XMLStreamException stating that the parser has identified a conflicting event type, or the method itself will return null.

You can turn on a number of parser properties by using the setProperty() method of the XMLInputFactory class. For example, the following specifies that entity references encountered by the parser will be replaced:

   factory.setProperty(
     XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, 
       Boolean.TRUE);

To prevent the parser from supporting external entities, use the following setting:

   factory.setProperty(
     XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, 
       Boolean.FALSE);

To make the parser namespace aware, use the following setting:

   factory.setProperty(
     XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.TRUE);

Note that the current version of SJSXP will accept the following command, but the parser is currently non-validating.

   factory.setProperty(XMLInputFactory.IS_VALIDATING, 
     Boolean.TRUE);

If any of these XMLInputFactory properties are enabled, you can use the setXMLReporter() method to handle any errors faced by the parser. The easiest way to determine exactly what type of error the parser encountered is to use the following anonymous inner class in conjunction with the setXMLReporter() method:

   factory.setXMLReporter(new XMLReporter() {
     public void report(String message, String errorType,
       Object relatedInformation, Location location) {
         System.err.println("Error in " 
          + location.getLocationURI());
         System.err.println("at line " 
          + location.getLineNumber()
          ", column " + location.getColumnNumber());
         System.err.println(message);
     }
   });

Using SJSXP to Write XML Documents

Writing XML output is easy with SJSXP. In this case, you can use the XMLStreamWriter interface instead of the XMLStreamReader interface. The XMLStreamWriter interface provides direct methods to write elements, attributes, comments, text, and all the other parts of an XML document. The following example shows how to obtain this interface and use it to write an XML document:

   XMLOutputFactory xof =  XMLOutputFactory.newInstance();
   XMLStreamWriter xtw = 
     xof.createXMLStreamWriter(new FileWriter("myFile"));

   xtw.writeComment(
     "all elements here are in the HTML namespace");
   xtw.writeStartDocument("utf-8","1.0");
   xtw.setPrefix("html""http://www.w3.org/TR/REC-html40");
   xtw.writeStartElement(
     "http://www.w3.org/TR/REC-html40","html");
   xtw.writeNamespace(
     "html""http://www.w3.org/TR/REC-html40");
   xtw.writeStartElement(
     "http://www.w3.org/TR/REC-html40","head");
   xtw.writeStartElement(
     "http://www.w3.org/TR/REC-html40","title");
   xtw.writeCharacters("Java Information");
   xtw.writeEndElement();
   xtw.writeEndElement();

   xtw.writeStartElement(
     "http://www.w3.org/TR/REC-html40","body");
   xtw.writeStartElement("http://www.w3.org/TR/REC-html40","p");
   xtw.writeCharacters("Java homepage is ");
   xtw.writeStartElement("http://www.w3.org/TR/REC-html40","a");
   xtw.writeAttribute("href","http://java.sun.com");
   xtw.writeCharacters("here");
   xtw.writeEndElement();
   xtw.writeEndElement();
   xtw.writeEndElement();
   xtw.writeEndElement();
   xtw.writeEndDocument();

   xtw.flush();
   xtw.close();

When you finish writing out each of the elements, you need to flush and close the writer.

The preceding code will output the following XML (formatted here with line breaks for easier reading):

 <!--all elements here are explicitly in the HTML namespace-->
<?xml version="1.0" encoding="utf-8"?>
<html:html xmlns:html="http://www.w3.org/TR/REC-html40">
<html:head>
<html:title>Java Information</html:title>
</html:head>
<html:body>
<html:p>
Java information is
<html:a href="http://frob.com">here</html:a>
</html:p>
</html:body>
</html:html>

Filtering XML Documents

You can create a filter for an incoming XML document if you don't want to scan through each item type. To do so, create a class that implements the javax.xml.stream.StreamFilter interface. This interface consists of only one method, accept(), that accepts an XMLStreamReader object and returns a primitive boolean. A typical implementation of StreamFilter looks like the following:

   public class MyStreamFilter implements StreamFilter {

       public boolean accept(XMLStreamReader reader) {
           if(!reader.isStartElement() && !reader.isEndElement())
               return false;
           else
               return true;
       }
   }

You then create a filtered reader by calling the createFilteredReader() method of the XMLInputFactory, and pass in both the original XML stream reader and the StreamFilter implementation. This is shown below:

   factory.createFilteredReader(
     factory.createXMLStreamReader(in)new MyStreamFilter());

For more information on the SJSXP, see the Sun Java Streaming XML Parser release notes.

Running the Sample Code for the Sun Java Streaming XML Parser

  1. Download the sample archive (ttfeb2005sjsxp.jar) for this tech tip.

  2. Download and install Java WSDP 1.5 from the Java Web Services Developer Pack Downloads page.

  3. Change to the directory where you downloaded the sample archive. Uncompress the JAR file for the sample archive as follows:
     jar xvf ttfeb2005sjsxp.jar
  4. Set your classpath to include the ttfeb2005sjsxp.jar and jsr173_api.jar files, which are located in the sjsxp/lib directory of the Java WSDP 1.5 installation.

  5. Compile and run the SJSXPInput executable.
    In response, you should see an entry similar to the following for each XML item:
     Event Type (Code=11): DTD
    Without a Name
    With Text: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
    transitional.dtd">
    -----------------------------
  6. Compile and run the SJSXPOutput executable. The output will be sent to file named XMLOutputFile and consist of the elements shown in the output example above.

Copyright (c) 2004-2005 Sun Microsystems, Inc.
All Rights Reserved.


 Related Tips

 
< Prev   Next >

Page 1 of 0 ( 0 comments )

You can share your information about this topic using the form below!

Please do not post your questions with this form! Thanks.


Name (required)


E-Mail (required)

Your email will not be displayed on the site - only to our administrator
Homepage(optional)



Comment Enable HTML code : Yes No



 
       
         
     
 
 
 
   
 
 
java bottom left
java bottom middle
java bottom right
RSS 0.91 FeedRSS 1.0 FeedRSS 2.0 FeedATOM FeedOPML Feed

Home - About Us - Privacy Policy
Copyright 2005 - 2008 www.java-tips.org
Java is a trademark of Sun Microsystems, Inc.