Possible Duplicate: Python returning the wrong length of string when using special characters I

Question

0

Editorial Team

Asked: June 12, 20262026-06-12T12:05:14+00:00 2026-06-12T12:05:14+00:00

Possible Duplicate: Python returning the wrong length of string when using special characters I

0

Possible Duplicate:
Python returning the wrong length of string when using special characters

I read a multilingual string from file in windows-1251, for example s="qwe абв" (second part in Russian), and then:

for i in s.decode('windows-1251').encode('utf-8').split():
  print i, len(i)

and I get:

qwe 3
абв 6

Oh God, why? o_O

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T12:05:19+00:00

In programming languages you can’t always think of strings as a sequence of characters, because generally they are actually a sequence of bytes. You can’t store every character or symbol in 8 bits, character encodings create some rules to combine multiple bytes into a single character.

In the case of the string 'абв' encoded in utf-8, what you have is 6 bytes that represent 3 characters. If you want to count the number of characters instead of the number of bytes, make sure you are taking the length from a unicode string.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Possible Duplicate: Python returning the wrong length of string when using special characters I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply