I have a string coming in, which I need to store in the db. Now the string does contain the copyright symbol ©. I want to convert this to © ; so that it can be displayed properly on each and every browser and with every encoding standard.
This is where I’ve reached up until now
– tried replace(), which definitely wouldn’t have worked anyways for the copyright character.
– tried turning tables by setting different encoding standards to view data in the browser, it gets displayed as a �
– converted the string to a byte array with UTF-8 charset and figured out that -62 is that ASCII value for the copyright character. Now the problem is that the string coming in could be quite big and splitting it up to a byte array and then forming a string back would be very expensive.
Any help is appreciated.
HTML-Escaping
This may not solve your encoding issues but answering your question from title.
To HTML-escape a String I recommend
StringEscapeUtilsfrom Apache Commons LangEncoding
To address your encoding problems..when you want to use
UTF-8then ensure that at least one of the following things is set. Additionally when setting more than one of them then all of them have to be consistent.Content-Type in HTTP-Header
HTML
HTML 5
XHTML
Also ensure that the content you’re serving is really UTF-8 encoded. I recommend to use UTF-8 encoding without BOM.