I’m working with Apache mod_dav compiled on my own server. My client is built-from-scratch custom HTTP parsing code in Java. I’ve been using this server and code base for years, synchronizing gigabytes of data on the server.
Today I ran into a problem that has never cropped up before: the dreaded SAX “content is not allowed in trailing section” error. When doing WebDAV PROPFINDs throughout my whole server resource tree, I always get this error at the same location.
I’ve tested and retested my HTTP parsing code, but it’s pretty simple: Apache is sending back chunked content, and the chunks indicate the number of bytes to consume.
The place it fails is the XML response that happens to use 110 chunks—significantly larger than most other responses (this is a very large directory). However, in my logs I can see that there is no “trailing content”—each XML response (that producing an error and the ones that do not) ends with a simple linefeed character.
But even more distressing: I have an input stream that parses the HTTP chunked content and sends back a simple string of bytes. When I pass this input stream directly to the XML parser, I get the following error. However: if I take the same input stream and bleed all the bytes from it, put them in a ByteArrayInputStream, and then give the ByteArrayInputStream (which should contain the exact same data!) to the parser, no error occurs! What is it about parsing directly from the incoming data that causes the error?
My XML parser is pretty straightforward:
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
documentBuilderFactory.setValidating(false);
Anyone seen this before? (I searched for “mod_dav XML bug”—and just got the unrelated bug I filed five years ago.)
Here is the relevant part of the stack trace:
Cause:org.xml.sax.SAXParseException: Content is not allowed in trailing section.
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
com.globalmentor.net.http.HTTPClientTCPConnection.readResponseBodyXML(HTTPClientTCPConnection.java:666)
com.globalmentor.net.http.webdav.WebDAVResource.propFind(WebDAVResource.java:453)
Update: I’ve done this test over and over. Finally I added code to walk the stack trace and print out the SAX parse information I get:
Public Id: null System Id: null Line# 21937 Column# 1
I copy the XML from the log file, sure enough, line 21937 is the end of the file—but there is nothing there!!
Oh, man—this is one of the most aggravating and subtle bugs I’ve ever worked on! I was so tempted just to read the response XML into bytes and return a
ByteArrayInputStreamand return that, although I didn’t know why that fixed the problem. It turns out that it was my fault, kind of, technically, but still…So it turns out that if you read the API contract of
InputStream.read(byte b[], int off, int len), the method is never supposed to return zero bytes! If it reaches the end of the data, it should return -1, or block until data is available. (What it should do if the caller requests alenof zero is unclear, as that doesn’t seem to be prohibited by the API. A more modern API would specify that anIllegalArgumentExceptionshould be thrown iflen<1, but I digress.)My
HTTPChunkedInputStreamautomatically parses out the chunks for an HTTP chunked response. The way it was written, if the caller ofHTTPChunkedInputStream.read(byte b[], int off, int len)requested exactly the number of bytes available in the last chunk, then the input stream would not proactively try to load further chunks and recognize the end of the stream. That in itself is not a problem, but the next time the caller wants more bytes, the way the algorithm was written, my input stream would try to read another chunk, recognize that there were no more chunks left, and then indicate that zero bytes were read! (Mind you, this only occurred if the called first requested exactly the number of bytes in the last chunk, and then later asked for more bytes.) Any time after this it would return -1, as the end of the data had been hit.So in this particular case, for whatever reason, the XML parser asked for exactly the remaining bytes in the XML response from WebDAV PROPFIND. Then the parser wanted to check to see if there were further characters. The actual reading happens in
UTF8Reader; when my input stream returned that zero bytes were read, this was passed upXMLEntityScanner. Neither of these classes know how to handle “no bytes were read”—it just assumes something was read. Lastly,XMLDocumentScannerImplchecks to see what that “something” was on line 1453:Because the end of the stream wasn’t indicated (it doesn’t know how to handle “nothing”), it assumed there was “something” there, and this something must be illegal trailing content.
Whew! I’ve fixed my
HTTPChunkedInputStreamclass to never return zero bytes fromread(). I am exhausted—this is one of the things that never even turn up except infrequently under certain conditions. And when I read the bytes and returned them in aByteArrayInputStream, this didn’t show up because my code to suck the bytes out of theHTTPChunkedInputStreamnever requested exactly the number of bytes in the last chunk—and if it did, it still knew how to suck out those zero bytes and put them in the buffer, along with the others.