I have Django running on a standard WSGI/Apache httpd combo.
I noticed that file output was different when I ran code in the shell vs. from the browser. I’ve isolated out everything else and am still getting the same problem.
Here’s the code:
def test_antiword(filename):
import subprocess
with open(filename, 'w') as writefile:
subprocess.Popen(["antiword", '/tmp/test.doc'], stdout=writefile)
p = subprocess.Popen(["antiword", '/tmp/test.doc'], stdout=subprocess.PIPE)
out, _ = p.communicate()
ords = []
for kk in out:
ords.append(ord(kk))
return out, ords
def test_antiword_view(request):
import HttpResponse
return HttpResponse(repr(test_antiword('/tmp/web.txt')))
When open the url in the browser, this is the output:
(‘\n”I said good day sir. Good day!” shouted Sh\xe9rlo\xe7k H\xf8lme\xa3.\n\n “Why not Zoidberg?” queried Zoidberg.\n’, [10, 34, 73, 32, 115, 97, 105, 100, 32, 103, 111, 111, 100, 32, 100, 97, 121, 32, 115, 105, 114, 46, 32, 71, 111, 111, 100, 32, 100, 97, 121, 33, 34, 32, 115, 104, 111, 117, 116, 101, 100, 32, 83, 104, 233, 114, 108, 111, 231, 107, 32, 72, 248, 108, 109, 101, 163, 46, 10, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 87, 104, 121, 32, 110, 111, 116, 32, 90, 111, 105, 100, 98, 101, 114, 103, 63, 34, 32, 113, 117, 101, 114, 105, 101, 100, 32, 90, 111, 105, 100, 98, 101, 114, 103, 46, 10])
This is the corresponding output when I call test_antiword('/tmp/shell.txt') ine hte shell:
(‘\n\xe2\x80\x9cI said good day sir. Good day!\xe2\x80\x9d shouted Sh\xc3\xa9rlo\xc3\xa7k H\xc3\xb8lme\xc2\xa3.\n\n \xe2\x80\x9cWhy not Zoidberg?\xe2\x80\x9d queried Zoidberg.\n’, [10, 226, 128, 156, 73, 32, 115, 97, 105, 100, 32, 103, 111, 111, 100, 32, 100, 97, 121, 32, 115, 105, 114, 46, 32, 71, 111, 111, 100, 32, 100, 97, 121, 33, 226, 128, 157, 32, 115, 104, 111, 117, 116, 101, 100, 32, 83, 104, 195, 169, 114, 108, 111, 195, 167, 107, 32, 72, 195, 184, 108, 109, 101, 194, 163, 46, 10, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 226, 128, 156, 87, 104, 121, 32, 110, 111, 116, 32, 90, 111, 105, 100, 98, 101, 114, 103, 63, 226, 128, 157, 32, 113, 117, 101, 114, 105, 101, 100, 32, 90, 111, 105, 100, 98, 101, 114, 103, 46, 10])
As you can see, the output is very different. For one thing, the shell output maintains the whitespace that was in the original file; it’s lost in the web version.
As you can see in the code, I also output the documents to files. The generated output is below:
web.txt
"I said good day sir. Good day!" shouted Sh?rlo?k H?lme?.
"Why not Zoidberg?" queried Zoidberg.
shell.txt
“I said good day sir. Good day!” shouted Shérloçk Hølme£.
“Why not Zoidberg?” queried Zoidberg.
In the web version, the characters are unrecognized and the encoding is identified by file as ISO-8859. In the shell version, the characters display correctly and the encoding is identified by file as UTF-8.
I am at a loss to why this could be happening. I’ve checked and both processes are using the same version of antiword. In addition, I’ve verified that they are both using the same python module file for subprocess. The version of Python being used in both cases matches exactly also.
Can anyone explain what might be going on?
The difference is likely due to an environment variable. According to the man page:
I suspect that what’s happening is that when you run it from your shell, your shell is in a UTF-8 locale, but when you run it from Django, it’s in a different locale, and it can’t properly convert the Unicode characters. Try switching into a UTF-8 locale when running the subprocess like this: