I received some text that is encoded, but I don’t know what charset was used. Is there a way to determine the encoding of a text file using Python? How can I detect the encoding/codepage of a text file deals with C#.
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
EDIT: chardet seems to be unmantained but most of the answer applies. Check https://pypi.org/project/charset-normalizer/ for an alternative
Correctly detecting the encoding all times is impossible.
(From chardet FAQ:)
There is the chardet library that uses that study to try to detect encoding. chardet is a port of the auto-detection code in Mozilla.
You can also use UnicodeDammit. It will try the following methods: