I’m trying to parse some poorly generated xml code with scala that looks like this:
<contextfile concordance=brown>
<context filename=br-a01 paras=yes>
<p pnum=1>
<s snum=1>
<wf cmd=ignore pos=DT>The</wf>
</s>
</p>
...
It’s well structured, but as you can see there are no quotes surrounding any of the attribube values. Simplying opening the file with the below Scala snippet throws a not so surprizing error:
val semCor = XML.loadFile(args(0))
throws
org.xml.sax.SAXParseException: Open quote is expected for attribute "{1}" associated with an element type "concordance".
I’d like to know how, if it at all possible, to setup the scala xml parser to correctly parse this input as if there were quotes surrounding the attribute values.
Thanks for any suggestions!
It is not possible to configure the parser to that extent in Scala. However, since your XML is malformed, you could use an HTML tidy library such as JSoup or TagSoup to tidy your XML first and then parse it with Scala XML. Or just get the data you want from the XMl using JSoup directly.