I have the following string stored in the database which is in Unicode format.
كنت قد دخلت بالفعل في مكان آخر من
Now, I want to convert that string into a readable format. In Java, how can I do that?
Since these are HTML entities, you need some sort of library method that will resolve them into the characters that they represent.
Apache Commons has
StringEscapeUtils.unescapeHtmlfor example, and I’m sure there are plenty of others.If you really want to roll something yourself, for this particular case you could tokenise the numbers between
&#and;, parse them as a hex int, and callCharacter.toCharsto convert them to Java characters. It’ll take more work and contain more bugs than using a library, though, and I’m sure there’s edge cases in the spec which I’m glossing over.Either should give the result
(By the way, I think you should be more specific about what you mean by ‘readable format’. I can read that string right now – it’s a sequence of entity references. You’re a developer, be precise!)