I’m looking for a neat and efficient way to replace characters in XML document. There is a replacement table defined for almost 12.000 UTF-8 characters, most of them are to be replaced by single characters, but some must be replaced by two or even three characters (e.g. Greek theta should become TH). The documents can be bulky (100MB+). How to do it in Java? I came up with the idea of using XSLT, but I’m not too sure if this is the best option.
Share
String.replace(..) is very slow, based on my experience. I used to parse 100MB KML files using that API and the performance is just bad. Then, I pre-compiled the regular expression using Pattern.compile(..) and that worked whole lot faster.