Specifically, I was using dom4j to read in KML documents and parsing out some of the data in the XML. When I just pass in the URL in string form to the reader, it’s so simple and handles both file system URLs and web URLs:
SAXReader reader = new SAXReader();
Document document = reader.read(url);
The problem is, sometimes my code will need to handle KMZ documents, which are basically just zipped up XML (KML) documents. Unfortunately, there’s no convenient way to handle this with the SAXReader. I’ve found all kinds of funky solutions to determining if any given file is a ZIP file, but my code quickly becomes blown up and nasty — reading the stream, building a file, checking the “magic” hex bytes at the beginning, extracting, etc.
Is there some quick and clean way to handle this? An easier way to connect to any URL and extract the contents if they’re compressed, otherwise simply grab the XML?
Hmm, it doesn’t seem the KMZDOMLoader handles kmz files on the web. It’s possible that the kmz is being loaded dynamically so it won’t always have a) a file reference or b) a .kmz extension specifically — it’ll have to determine by content type.
What I ended up doing was to build a URL object, then get the protocol. I have separate logic to handle a local file or a document on the web. Then inside each of those logic blocks, I had to determine if it was compressed. The SAXReader read() method takes an input stream, so I found that I could use a ZipInputStream for the kmzs.
Here’s the code I ended up with: