Does anyone know of a Windows app that can scan through a directory and check which scripts are/aren’t encoded as a specified charset (UTF-8 in this case)? I could do it manually, but that could take a while and is quite error prone!
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
UTF-8 isn’t a character set, it’s an encoding for Unicode characters. And, since this is not programming related, I’m nudging it over to superuser.
If you do want to write a program for detecting those sequences, it’s pretty easy:
Then, provided the first octet is legal, just remember that the number of octets forming a code point can be obtained by counting the number of
1bits before the first0bit.For example,
11110xxxis the start of a 4-octet sequence so you should skip ahead 4 octets once you’ve established its legality.The other thing to do is ensure that all continuation octets start with
10.