I’m trying to parse meta tags with Scala. I’ve tried just doing this with XML matching, like
`html // meta ...` etc,
but I’m getting a malformed-XML error because these meta tags on this particular page have no ending tag or ... /> enclosure.
So for the following HTML,
val html = """<meta name="description" content="This is some meta description">"""
I’m using the following regex matcher:
val metaDescription = """.*meta name="Description" content="([^"]+)"""".r
- When I try to match with
val metaDescription(desc) = htmlI get a scala.MatchError. - When I try with
metaDescription.findAllIn(html)and iterate, I get the whole string–not just the description.
How can I just get the value inside content and nothing else?
EDIT
I got the result I wanted with:
metaDescription.findAllIn(html).matchData foreach {
desc => println(desc.group(1))
}
but that seems like a long way around. Is there a better solution?
Scala XML and TagSoup provides one way to use tag soup directly with Scala XML.
If you are open to alternatives then Scales Xml provides a similar useful approach to parse html via alternative SAX parsers:
example factories for Tagsoup and Nu.Validator are provided on that link.