I have to parse a String to a Date object in Java.
The string I get following the pattern MMM d yyyy HH:mm:ss z with locale set to French.
The problem occures when the date is in february, august or december due to encoding of french accents. For example, I get déc. 15 2011 16:55:38 CET for december 15th 2011.
I can’t change the way the string is created so I have to deal with the bad encoding on my side. It seems that when generated the string is badly encoded (UTF-8 content encoded as ISO 8859-1) then escapde.
For now I use :
stringFromXML = stringFromXML.replaceAll("é", "é");
stringFromXML = stringFromXML.replaceAll("û", "û");
It works because the only accent in french month are é and û but is there a cleaner way to unescape and convert characters?
You need two steps:
Resolve numeric character references, for example, using
StringEscapeUtilsas suggested by Andy:Fix encoding by treating characters as UTF-8 code units: