I have a list with different strings. Sometimes they are in cp1251, ASCII or something else. I need to process them (convert to Unicode), because I got an error (UncicodeDecodeError), especially when I tried to dump this data to JSON.
How can I do this?
You can use chardet to detect the encoding of a string, so one way to convert a list of them to unicode (in Python 2.x) would be:
… which you’d use like this:
CAVEAT: Solutions like chardet should only be used as a last resort (for instance, when repairing a dataset that’s corrupt because of past mistakes). It’s far too fragile to be relied on in production code; instead, as @bames53 points out in the comments to this answer, you should fix the code that corrupted the data in the first place.