Reading XML Data into a DOM

In this section, you'll construct a Document Object Model by reading in an existing XML file. In the following sections, you'll see how to display the XML in a Swing tree component and practice manipulating the DOM.

Note: In Chapter 7, you'll see how to write out a DOM as an XML file. (You'll also see how to convert an existing data file into XML with relative ease.)

Creating the Program

The Document Object Model provides APIs that let you create, modify, delete, and rearrange nodes. So it is relatively easy to create a DOM, as you'll see later in Creating and Manipulating a DOM.

Before you try to create a DOM, however, it is helpful to understand how a DOM is structured. This series of exercises will make DOM internals visible by displaying them in a Swing JTree.

Create the Skeleton

Now let's build a simple program to read an XML document into a DOM and then write it back out again.

Start with the normal basic logic for an app, and check to make sure that an argument has been supplied on the command line:

public class DomEcho {
  public static void main(String argv[])
  {
    if (argv.length != 1) {
      System.err.println(
          "Usage: java DomEcho filename");
      System.exit(1);
    }
  }// main
}// DomEcho

Import the Required Classes

In this section, all the classes individually named so you that can see where each class comes from when you want to reference the API documentation. In your own applications, you may well want to replace the import statements shown here with the shorter form, such as javax.xml.parsers.*

import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 
import javax.xml.parsers.FactoryConfigurationError; 
import javax.xml.parsers.ParserConfigurationException;

Add these lines for the exceptions that can be thrown when the XML document is parsed:

import org.xml.sax.SAXException; 
import org.xml.sax.SAXParseException;

import java.io.File;
import java.io.IOException;

import org.w3c.dom.Document;
import org.w3c.dom.DOMException;

Note: A DOMException is thrown only when traversing or manipulating a DOM. Errors that occur during parsing are reported using a different mechanism that is covered later.

Declare the DOM

The org.w3c.dom.Document class is the W3C name for a DOM. Whether you parse an XML document or create one, a Document instance will result. You'll want to reference that object from another method later, so define it as a global object here:

public class DomEcho
{ 
  static Document document;

  public static void main(String argv[])
  {

It needs to be static because you'll generate its contents from the main method in a few minutes.

Handle Errors

Next, put in the error-handling logic. This logic is basically the same as the code you saw in Handling Errors with the Nonvalidating Parser in Chapter 5, so we don't go into it in detail here. The major point is that a JAXP-conformant document builder is required to report SAX exceptions when it has trouble parsing the XML document. The DOM parser does not have to actually use a SAX parser internally, but because the SAX standard is already there, it makes sense to use it for reporting errors. As a result, the error-handling code for DOM applications are very similar to that for SAX applications:

public static void main(String argv[])
{
  if (argv.length != 1) {
    ...
  }

  try {

} catch (SAXParseException spe) {
  // Error generated by the parser
    System.out.println("\n** Parsing error"
      + ", line " + spe.getLineNumber()
      + ", uri " + spe.getSystemId());
    System.out.println("   " + spe.getMessage() );
  
    // Use the contained exception, if any
    Exception  x = spe;
    if (spe.getException() != null)
      x = spe.getException();
    x.printStackTrace();

  } catch (SAXException sxe) {
    // Error generated during parsing
    Exception  x = sxe;
    if (sxe.getException() != null)
      x = sxe.getException();
    x.printStackTrace();

   } catch (ParserConfigurationException pce) {
    // Parser with specified options can't be built
    pce.printStackTrace();

   } catch (IOException ioe) {
    // I/O error
    ioe.printStackTrace();
  }

}// main

Instantiate the Factory

Next, add the following highlighted code to obtain an instance of a factory that can give us a document builder:

public static void main(String argv[])
{
  if (argv.length != 1) {
    ...
  }
  DocumentBuilderFactory factory =
    DocumentBuilderFactory.newInstance();
  try {

Get a Parser and Parse the File

Now, add the following highlighted code to get an instance of a builder, and use it to parse the specified file:

try {
  DocumentBuilder builder = factory.newDocumentBuilder();
  document = builder.parse( new File(argv[0]) );
} catch (SAXParseException spe) {

Note: By now, you should be getting the idea that every JAXP application starts in pretty much the same way. You're right! Save this version of the file as a template. You'll use it later on as the basis for XSLT transformation application.

Run the Program

Throughout most of the DOM tutorial, you'll use the sample slide shows you saw in the Chapter 5. In particular, you'll use slideSample01.xml, a simple XML file with nothing much in it, and slideSample10.xml, a more complex example that includes a DTD, processing instructions, entity references, and a CDATA section.

For instructions on how to compile and run your program, see Compiling and Running the Program from Chapter 5. Substitute DomEcho for Echo as the name of the program, and you're ready to roll.

For now, just run the program on slideSample01.xml. If it runs without error, you have successfully parsed an XML document and constructed a DOM. Congratulations!

Note: You'll have to take my word for it, for the moment, because at this point you don't have any way to display the results. But that feature is coming shortly...

Additional Information

Now that you have successfully read in a DOM, there are one or two more things you need to know in order to use DocumentBuilder effectively. You need to know about:

Configuring the Factory

By default, the factory returns a nonvalidating parser that knows nothing about namespaces. To get a validating parser, or one that understands namespaces (or both), you configure the factory to set either or both of those options using following highlighted commands:

public static void main(String argv[])
{
  if (argv.length != 1) {
    ...
  }
  DocumentBuilderFactory factory =
    DocumentBuilderFactory.newInstance();
  factory.setValidating(true);
  factory.setNamespaceAware(true);
  try {
    ...

Note: JAXP-conformant parsers are not required to support all combinations of those options, even though the reference parser does. If you specify an invalid combination of options, the factory generates a ParserConfigurationException when you attempt to obtain a parser instance.

You'll learn more about how to use namespaces in Validating with XML Schema. To complete this section, though, you'll want to learn something about handling validation errors.

Handling Validation Errors

Remember when you were wading through the SAX tutorial in Chapter 5, and all you really wanted to do was construct a DOM? Well, now that information begins to pay off.

Recall that the default response to a validation error, as dictated by the SAX standard, is to do nothing. The JAXP standard requires throwing SAX exceptions, so you use exactly the same error-handling mechanisms as you use for a SAX application. In particular, you use the DocumentBuilder's setErrorHandler method to supply it with an object that implements the SAX ErrorHandler interface.

The following code uses an anonymous inner class to define that ErrorHandler. The highlighted code makes sure that validation errors generate an exception.

builder.setErrorHandler(
  new org.xml.sax.ErrorHandler() {
    // ignore fatal errors (an exception is guaranteed)
    public void fatalError(SAXParseException exception)
    throws SAXException {
    }
    // treat validation errors as fatal
    public void error(SAXParseException e)
    throws SAXParseException
    {
      throw e;
    }

     // dump warnings too
    public void warning(SAXParseException err)
    throws SAXParseException
    {
      System.out.println("** Warning"
        + ", line " + err.getLineNumber()
        + ", uri " + err.getSystemId());
      System.out.println("   " + err.getMessage());
    }
  
);

This code uses an anonymous inner class to generate an instance of an object that implements the ErrorHandler interface. It's "anonymous" because it has no class name. You can think of it as an "ErrorHandler" instance, although technically it's a no-name instance that implements the specified interface. The code is substantially the same as that described in Handling Errors with the Nonvalidating Parser. For a more complete background on validation issues, refer to Using the Validating Parser.

Looking Ahead

In the next section, you'll display the DOM structure in a JTree and begin to explore its structure. For example, you'll see what entity references and CDATA sections look like in the DOM. And perhaps most importantly, you'll see how text nodes (which contain the actual data) reside under element nodes in a DOM.