StaxMate Tutorial
As per introduction, StaxMate is designed to allow
- Reading XML content efficiently, conveniently and correctly
- Writing XML content efficiently, conveniently and correctly
To see what this means in action, let's have a look at simple sample use cases.
Writing an XML document
Let's start with a particular simple and common use case: that of writing (aka generating) XML content. Content can come from variety of sources; here we only consider generation, not where data comes from.
Let's say we want to output XML document like this one:
1 <!-- generated: [CURRENT TIME]-->
2 <employee id="123">
3 <name>
4 <first>Tatu</first>
5 <last>Saloranta</last>
6 </name>
7 </employee>
Let's first look at one possible piece of code to output such a document:
1 // 1: need output factory
2 SMOutputFactory outf = new SMOutputFactory(XMLOutputFactory.newInstance());
3 SMOutputDocument doc = outf.createOutputDocument(new File("empl.xml"));
4 // (optional) 3: enable indentation (note spaces after backslash!)
5 doc.setIndentation("\n ", 1, 1);
6 // 4. comment regarding generation time
7 doc.addComment(" generated: "+new java.util.Date().toString());
8 SMOutputElement empl = doc.addElement("employee");
9 empl.addAttribute(/*namespace*/ null, "id", 123);
10 SMOutputElement name = empl.addElement("name");
11 name.addElement("first").addCharacters("Tatu");
12 name.addElement("last").addCharacters("Saloranta");
13 // 10. close the document to close elements, flush output
14 doc.closeRoot();
So how does that work? Here is what is being done and why:
First we create a StaxMate output factory: here we use automatic introspection that Stax XMLOutputFactory offers (to find any plugged implementation)
- This output factory is full thread-safe (after configuration), and should be reused: usually a single(ton) instance is enough for the whole application or service.
- Create the document output object. Document object just denotes document itself, not a root element: but we will add the root element under it (could also add comments, processing instructions). In this case, we will write an xml file.
- We can also choose to "pretty print" output document, by enabling indentation.
- It is often useful to add xml comments that include developer-readable information about generator process; it is easily ignored by xml readers
- Add Employee element
Add attribute 'id' with typed value (StaxMate can convert from number to String)
- Add 'name' element
- Add both 'first' element and its textual contents
- (note: in Woodstox 2.0 could use "name.addElementWithCharacters()" to simplify this!)
- Add similarly, 'last' and its textual contents
- Important: MUST close the root-level object; otherwise start elements may not get closed, contents not flushed to the file.
Some things to consider:
Although we use XMLInputFactory implementation auto-discovery here, it is often preferable to pass this information from outside, perhaps using Dependency Injection framework (can then specify which impl to use; recommended one is com.ctc.wstx.stax.WstxOutputFactory, for Woodstox).
- Indentation usually should NOT be used for production systems -- since it just adds 20-30% to document size without any useful additional information -- but it can be convenient during development and debugging.
Instead of writing contents to a file, we could have uses a ByteArrayOutputStream, StringWriter, or servlet's OutputStream as well; there are many convenience methods for typical targets.
- Typed conversion for 'id' attribute value is just one example of ability to use Java types for output, not having to convert to Strings first
- Methods that add child containers (SMOutputElement usually) can be chained, if the element itself is not needed for anything else; this can shorten the code nicely without reducing readability.
Reading XML content
Now that we have written some XML content, let's read it back in. Let's start with the code (that uses xml document that we saw earlier):
1 // 1: need input factory
2 SMInputFactory inf = new SMInputFactory(XMLInputFactory.newInstance());
3 // 2: and root cursor that reads XML document from File:
4 SMHierarchicCursor rootC = inf.rootElementCursor(new File("empl.xml"));
5 rootC.advance; // note: 2.0 only method; can also call ".getNext()"
6 int employeeId = rootC.getAttrIntValue(0);
7 SMHierarchicCursor nameC = rootC.childElementCursor("name").advance();
8 SMHierarchicCursor leafC = nameC.childElementCursor().advance();
9 String first = leafC.collectDescendantText(false);
10 leafC.advance();
11 String last = leafC.collectDescendantText(false);
12 rootC.getStreamReader().closeCompletely()
So how does that work? Here is what is being done and why:
First we create a StaxMate input factory: similar to constructing output factory
- Create the root cursor; only traverses over the root element, ignores non-elements like comments
- Cursors are initially not positioned over an event, need to advance
- Since we know cursor must point to root element, we can access employee id attribute
- Need to create a cursor for traversing, filter out all except "name" elements, advance to the first (and only) "name" child element
- construct the innermost cursor for traversing immediate children of "name", advance to the first child ("first")
- collect all textual content
- advance to the second child ("second")
- collect all textual content
- close the underlying stream reader (important!)
Here are some more things to consider:
- As with output factory, usually it's better to inject specific factory
- Typed access works for cursors as well as for output elements: note, too, that both typed attribute values and element value can be handled (example only had typed attribute values)
- Cursors initially do not point to an event -- it is possible there are no events to point to, even -- so one must always advance cursor after construction.
* With StaxMate 2.0 this can be done with SMInputCursor.advance() call that is chainable (equivalent to 'getNext()', but instead of event type, returns cursor itself), and therefore works nicely with child-cursor construction calls.
- Code above is not very robust: specifically, it does not verify that the elements are as expected: for example, what if "first" and "last" elements where switched? So production code should add a few more lines for checking. Ditto for attribute access.
- The last line, closing the underlying stream reader, is important thing to do currently, to ensure underlying input source (File input stream) gets closed
* This is one area where StaxMate API should be improved in future.
Better than Stax 1.0?
Since the original claim was that StaxMate makes things more convenient, let's see what equivalent code for writein would look like, if we didn't have StaxMate:
1 // 1: need output factory, writer
2 XMLOutputFactory outf = XMLOutputFactory.newInstance());
3 FileOutputStream fos = new FileOutputStream(new File("empl.xml"));
4 XMLStreamWriter xw = outf.createXMLStreamWriter(fos, "UTF-8");
5 // No way to do automated indentation: must write manually
6 xw.writeStartDocument(); // not needed with StaxMate
7 xw.writeComment(" generated: "+new java.util.Date().toString());
8 xw.writeStartElement("employee");
9 String idStr = String.valueOf(123);
10 xw.writeAttribute("id", idStr);
11 xw.writeCharacters("\n "); // indent
12 xw.writeStartElement("name");
13 xw.writeCharacters("\n "); // indent
14 xw.writeStartElement("first");
15 xw.writeCharacters("Tatu");
16 xw.writeEndElement(); // for first
17 xw.writeCharacters("\n "); // indent
18 xw.writeStartElement("last");
19 xw.writeCharacters("Saloranta");
20 xw.writeEndElement(); // for last
21 xw.writeCharacters("\n "); // indent
22 xw.writeEndElement(); // for name
23 xw.writeCharacters("\n"); // indent
24 xw.writeEndElement(); // for employee
25 xw.writeEndDocument();
26 xw.close(); // as per Stax 1.0, won't close stream!
27 fos.close(); // so we need this too
So what does this tell us?
- Code with Stax 1.0 is quite a bit more verbose; with indentation, more than twice as many lines, but even without it, 50% more.
* But while code is longer, it is definitely less readable (that is, StaxMate's compactness tend to improve code readability)
- Non-scoped nature of writing means that there is additional redundancy -- write end elements must be written explicitly -- and this can easily cause bugs ("which start element did this match with again?"; hence comments above)
Similarly we could show the alternative for reading XML content: but unfortunately that code would be even more verbose. So to keep this tutorial brief, we'll leave that exercise to readers.
Reading XML content, part 2
Let's have a look at another example, with bit different structure:
<data> <id>123</id> <name>Template</name> <desc>Longer description</desc> </data>
Assuming we wanted to map this to a simple Data object, we could use:
1 SMInputFactory inf = new SMInputFactory(XMLInputFactory.newInstance());
2 SMHierarchicCursor rootC = inf.rootElementCursor(new File("data.xml")).advance();
3 SMHierarchicCursor valueC = rootC.childElementCursor();
4
5 int id = 0;
6 String name = null, desc = null;
7 while (valueC.getNext() != null) { // points to START_ELEMENT, null when no more
8 String elem = valueC.getLocalName();
9 if ("id".equals(elem)) {
10 id = elem.getElemIntValue();
11 } else if ("name".equals(elem)) {
12 name = elem.getElemStringValue();
13 } else if ("desc".equals(elem)) {
14 desc = elem.getElemStringValue();
15 } else {
16 throw new IllegalArgumentException("Unexpected element '"+elem+"'");
17 }
18 }
19 rootC.getStreamReader().closeCompletely();
20 return new Data(id, name, desc);
Advanced Use Cases
Here are links to some use cases that show more advanced usage:
TO BE WRITTEN: UseCaseReadTracking (tracking allows retaining some tree-structure/attribute information during traversal)
TO BE WRITTEN: (buffering allows for limited out-of-order output; like ability to add attributes to an element after writing child elements)
