Possible Duplicate:
Python returning the wrong length of string when using special characters
I read a multilingual string from file in windows-1251, for example s="qwe абв" (second part in Russian), and then:
for i in s.decode('windows-1251').encode('utf-8').split():
print i, len(i)
and I get:
qwe 3
абв 6
Oh God, why? o_O
In programming languages you can’t always think of strings as a sequence of characters, because generally they are actually a sequence of bytes. You can’t store every character or symbol in 8 bits, character encodings create some rules to combine multiple bytes into a single character.
In the case of the string
'абв'encoded in utf-8, what you have is 6 bytes that represent 3 characters. If you want to count the number of characters instead of the number of bytes, make sure you are taking the length from a unicode string.