I have a simple form where I can type some characters. These characters are sent to a servlet which does a getBytes and print the bytes. The correct UTF-8 bytes for a “ã” are -61 and -93, but I get -52 and -93. 🙁
I tried everything to understand and fix this, but nothing worked. Everything on my machine should be UTF-8 so I suspect it has to do with the US International keyboard I have been using for 20 years.
Does any smart soul have a clue from where -52 and -93 are coming from?
FIXED on Jetty: See my answer below.
BROKEN on Tomcat: How to get tomcat to understand MacRoman (x-mac-roman) charset from my Mac keyboard?
That is the Mac OS Roman character encoding. (0xBB == -52.)
Some things to check:
getBytes(string, "UTF-8")andnew String(bytes, "UTF-8").response.setContentType("text/html; charset="UTF-8");. In a JSP<%@page pageEncoding="UTF-8"%><form action="..." accept-charset="UTF-8">As all that did not help:
Set the request filtering in your web application (web-xml).
Encoding in pom.xml: