When I receive XML data (via a Twitter API call, in this instance), I imagine it’s best practice to somehow validate it before I begin working with it? My app has had a lot of untractable issues lately, and I want to rule out bad XML data.
Does XML ever go “bad” somehow? Would an overloaded server like Twitter’s ever spit out just half of what should come my way?
My real question is twofold: should I validate XML data before I work with it, and how would I go about doing that? (I already know the supposed structure of the XML data)
Thanks!
One last clarification before I select an answer (and thanks for your efforts): If I only need 5 predictable fields out of the static-length XML file, does something like this leave loopholes that creating an XSD overcomes?
if(!isset($xml->id, $xml->text, $xml->created_at, $xml->sender, $xml->recipient)) throw...
The most obvious method of validating your XML would be:
Attempt to load the XML into your favourite
DOM container or parse it using some other mechanism (I’m not completely familair with XML processing in PHP). This would allow you to check if the XML is ‘well formed’. If the XML is not well formed
(i.e. you only got half the XML
response back) then you’d catch this
problem at this point and deal with
it.
Once you’ve successfully
loaded/parsed the XML the next thing
is to validate it against an XML
schema. Unfortunately Twitter don’t
publish XML schemas for their XML so
you’d need to roll these yourself.
You can create your own XML schema’s by hand. Here’s a link that will help you get started:
You can also get tools such as Altova XMLSpy that can ‘infer’ a schema from your XML. i.e. it makes a best guess as to how to define the schema, you may have to tweak it after generation. There are other free tools out there but I’ve only ever used XMLSpy. As Alan says, if Twitter ever change the format for their XML you would need to update your schemas to take account for these changes.
Creating XML Schemas can be daunting at first but once you get the hang of it you’ll find it quite easy. I found this book to be excellent when I first started out: