I’d like to perform XPath queries on an XML document online. I’ve set up

Question

0

Asked: June 18, 20262026-06-18T14:33:32+00:00 2026-06-18T14:33:32+00:00

I’d like to perform XPath queries on an XML document online. I’ve set up

0

I’d like to perform XPath queries on an XML document online. I’ve set up InputStreams that retrieve the content and append a <?xml ...?> header that declares the encoding present in charset field of the HTTP requests. Although it works, it’s painfully slow.

    //bis is the BufferedInputStream with the content part of the HTTP reply
docBuilder = docBuilderFactory.newDocumentBuilder(); // throws exception.
Document doc = docBuilder.parse
    (new PrependInputStream(bis,
                "<?xml version='1.0' encoding='"+charset+"' ?>\r\n"));

(please allow me not to put my whole source this time: I’m preparing an assignment for students).

Some strace analysis revealed that the program stalls when contacting w3.org:

 send(8, "GET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1\r\nUser-Agent: Java/1.6.0_17\r\nHost: www.w3.org\r\nAccept: 
      text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2\r\nConnection: keep-alive\r\n\r\n", 186, 0)
 recv(8, ...

As I don’t worry too much about the HTML content to be valid (well-formed should be enough), I tried docBuilderFactory.setValidating(false) but that doesn’t seem to prevent online retrieval of the DTD.

Trying to set manually a schema with docBuilderFactory.setSchema() using the same dtd file retrieved manually results in a “org.xml.sax.SAXParseException: The markup in the document preceding the root element must be well-formed. ” (that was not a good idea)

Where am I over-complicating things?

(the XML backend seems to be com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadSchema, as far as I can tell from stack traces — if that’s of any use).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T14:33:33+00:00

Editorial Team

2026-06-18T14:33:33+00:00Added an answer on June 18, 2026 at 2:33 pm

HTML dtd’s are huge, using includes. And you are right, they take forever. Use an XML catalog. There one can store the dtds locally and map them by their system ID.

If you use a tool, like maven, you will find sufficient pointers.

The advantage i.o. intercepting entities as the answer linked by @sylvainulg suggests, is that you receive the correct characters.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’d like to perform XPath queries on an XML document online. I’ve set up

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply