I’m working in an iOS application which uses libXML2 to read XML retrieved from a backend system. I have the following XML, which is part of a larger XML document:
<properties uiValue="This is a multiline description with text that should wrap but should also preserve any whitespace: like this whitespace.
And preserve newlines.
espace:~` !@#$%^&*()_+=-<>/ \" name="desc">
<values value="This is a multiline description with text that should wrap but should also preserve any whitespace: like this whitespace.
And preserve newlines.
espace:~` !@#$%^&*()_+=-<>/ \"/>
</properties>
As a whole, the document seems to parse OK. The problem that I have is that the newlines are not being processed, so when I read the attribute value, the result is:
This is a multiline description with text that should wrap but should also preserve any whitespace: like this whitespace. And preserve newlines. espace:~` !@#$%^&*()_+=-<>/
Is there any way to keep these new lines? If I print out the response XML from the server directly, the new lines are preserved. When I go through the parsing though, the new lines are stripped out. To complicate matters a bit, this is some third party code that I’m trying to fix, and I haven’t really used libXML2 that much. The relevant code (I believe) is:
NSLog(@"Response:\n%@", [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease]);
xmlDocPtr doc = xmlReadMemory([data bytes], [data length], NULL, NULL, XML_PARSE_COMPACT | XML_PARSE_NOBLANKS);
xmlNodePtr cur = ....;
xmlChar *attrValue = xmlGetProp(cur, (const xmlChar *) "uiValue");
NSString *attrString = [NSString stringWithCString:(char*)attrValue encoding:NSUTF8StringEncoding];
I have tried taking the XML_PARSE_COMPACT and XML_PARSE_NOBLANKS options out, but that didn’t help (not that I expected it to, I believe those only affect nodes).
XML Parsers cannot and will not preserve linebreaks in attributes. From the spec:
The library performs this normalization as it’s parsing, so the newlines are gone. You can escape your newlines with numeric entity references as

but usually if you need to depend on the linebreaks, element values are used.