I am using graphviz’s dot to generate some svg graphs for a web application. I call dot using Popen:
p = subprocess.Popen(u'/usr/bin/dot -Kfdp -Tsvg', shell=True,\
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
str = u'long-unicode-string-i-want-to-convert'
(stdout,stderr) = p.communicate(str)
What happends is that the dot program throw errors like:
Error: not well-formed (invalid token) in line 1
... <tr><td cellpadding="4bgcolor="#EEE8AA"> ...
in label of node n260
That obvious error is most certainly NOT in the input string. In particular, if I save it to str.txt with utf-8 encoding and do
/usr/bin/dot -Kfdp -Tsvg < str.txt > myimg.svg
I get the desired output. The only ‘special’ thing about str is that it contain characters like the danish øæå.
Right now I have no clue what I should do. The problem may very well be in dot; but it certainly seem to be triggered by Popen being different than using < from the shell, and i have no idea where to begin. Any help or ideas for alternatively calling dot (besides writing all the data to a file and calling that!) would be very appreciated!
Sounds like you should be doing:
(except, of course, that you shouldn’t shadow the builtin
str.) The unicode type in Python holds unicode data, not UTF-8. If you want UTF-8, you need to explicitly encode it.On top of that, there’s no reason to use
shell=Truein that snippet, nor is the unicode literal passed to subprocess.Popen a particularly good idea (it just gets encoded to ASCII anyway.) And the backslash at the end is unnecessary — Python knows the line is continued, because you have an open parenthesis that hasn’t been closed yet. So, use: