I have implemented a simple file upload-download mechanism. When a user clicks a file name, the file is downloaded with these HTTP headers:
HTTP/1.1 200 OK Date: Tue, 30 Sep 2008 14:00:39 GMT Server: Microsoft-IIS/6.0 Content-Disposition: attachment; filename=filename.doc; Content-Type: application/octet-stream Content-Length: 10754
I also support Japanese file names. In order to do that, I encode the file name with this java method:
private String encodeFileName(String name) throws Exception{ String agent = request.getHeader('USER-AGENT'); if(agent != null && agent.indexOf('MSIE') != -1){ // is IE StringBuffer res = new StringBuffer(); char[] chArr = name.toCharArray(); for(int j = 0; j < chArr.length; j++){ if(chArr[j] < 128){ // plain ASCII char if (chArr[j] == '.' && j != name.lastIndexOf('.')) res.append('%2E'); else res.append(chArr[j]); } else{ // non-ASCII char byte[] byteArr = name.substring(j, j + 1).getBytes('UTF8'); for(int i = 0; i < byteArr.length; i++){ // byte must be converted to unsigned int res.append('%').append(Integer.toHexString((byteArr[i]) & 0xFF)); } } } return res.toString(); } // Firefox/Mozilla return MimeUtility.encodeText(name, 'UTF8', 'B'); }
It worked well so far, until someone found out that it doesn’t work well with long file names. For example: あああああああああああああああ2008.10.1あ.doc. If I change one of the single-byte dots to a single-byte underline , or if I remove the first character, it works OK. i.e., it depends on length and URL-encoding of a dot character. Following are a few examples.
This is broken (あああああああああああああああ2008.10.1あ.doc):
Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008%2E10%2E1%e3%81%82.doc;
This is OK (あああああああああああああああ2008_10.1あ.doc):
Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008_10%2E1%e3%81%82.doc;
This is also fine (あああああああああああああああ2008.10.1あ.doc):
Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008%2E10%2E1%e3%81%82.doc;
Anybody have a clue?
gmail handles file name escaping somewhat differently: the file name is quoted (double-quotes), and single-byte periods are not URL-escaped. This way, the long file name in the question is OK.
However, there is still a limitation (apparently IE-only) on the byte-length of the file name (a bug, I assume). So even if the file name is made of only single-byte characters, the beginning of the file name is truncated. The limitation is around 160 bytes.