UTF-8 is broken on .html files served through tomcat through /web-app. If I open the file directly with file:///, it renders fine. If I view the file during run-app, it looks fine. However, when I’m deployed as a war, UTF-8 characters appear garbled.
The content-type of the response seems correct…
Content-Type: text/html;charset=UTF-8, and the HTML file itself even seems to have the correct meta declarations.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
Tomcat’s connector is set to “UTF-8” for default URIEncoding, so I don’t think that’s the issue either. GSPs are rendering fine; only the HTML file has an issue.
What could be going on here?
EDIT:
Using firefox, I saved a local copy of the raw HTML file and the file read directly from file:///... for comparison. The only difference between the files is that the Tomcat version has all non-ascii characters replaced with this:
�
Which either renders as an empty square or question mark depending on what editor you’re using. The character itself seems to be EF BF BD, which replaces all the non-ascii characters. Somehow, in serving a file, tomcat or grails is just stomping on the unicode bytes. What could do that?
EDIT EDIT:
Even this w3 test file has the same behavior as my files, indicating that my files are probably fine, and there really is something up with Tomcat/Grails.
Grails 1.3.7, at least, cannot serve HTML files correctly. Write your own file server controller.