I have a Python application that gets multilingual information from websites, and it presents them in a small GUI window (wxpython based).
I (currently) don’t use any specific unicode statements in my source files.
Now, when I run my python application from within Eclipse, French characters (like ë) are displayed nicely, when I run it from a py2exe packaged version, the character go wonky.
I don’t really understand why as the building with py2exe doesn’t produce unicode or encoding related errors.
However, to fix this issue, and following this article, I wrapped my strings in a unicode(my_string, "utf-8") call just before outputting it to screen. This solves it.
Questions:
- Is wrapping strings in a
unicode()call just before displaying the good way to do it? - why does it work without the unicode conversion from within Eclipse, but not from a windows packaged .exe version?
I tried wrapping my head already many times around unicode, but it seems I am not unicode compatible 😐
The best approach is to ensure the strings are unicode as soon as possible. If the library you are scraping websites with are not proving you with unicode then they are not doing what they should (imho). Then you have to your self decode them to unicode using the same encoding that the web pages you are scraping is using.
Your approach is basically the opposite, decoding as late as possible. That it has worked so far is basically just pure luck because you have not encountered any non-utf8 strings yet. Any iso-8859-1 strings will break your app.