I can’t get a grip on how Python handles Unicode in files… f =

Question

0

Asked: June 7, 20262026-06-07T17:13:56+00:00 2026-06-07T17:13:56+00:00

I can’t get a grip on how Python handles Unicode in files… f =

0

I can’t get a grip on how Python handles Unicode in files…

f = open('test.txt', 'w')
f.write('abc')
f.close()

That gives a file of 3 bytes.

f = open('test.txt', 'w')
f.write('abcé')
f.close()

That gives a file of 5 bytes (the é takes up two bytes but how does Python knows that it must read 2 bytes there?)

f = open('test.txt', 'w')
f.write('abcそ')  # a Japanese character
f.close()

That gives a file of 6 bytes (the そ takes up three bytes but how does Python knows that it must read 3 bytes there?)

So I can understand that Unicode takes two bytes, but it is sometimes 1, or 2 or 3 bytes, I fail to see how it works.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T17:13:58+00:00

By default, it writes the output file with an encoding of UTF-8, which is a variable-length encoding: it encodes ASCII characters (code points U+0000-U+007F) using 1 byte, code points U+0080-U+07FF (which includes Latin-1 characters such as é) using 2 bytes, code points U+0800-U+FFFF (which includes Chinese and Japanese characters such as そ) using 3 bytes, and code points U+10000-U+10FFFF using 4 bytes.

If you want to use a different encoding, such as UTF-16, you can use str.encode to use your desired encoding:

# Save the string as UTF-16 little-endian
f = open('test.txt', 'w')
f.write(u'abcそ'.encode('utf-16le')  # Output will be 8 bytes
f.close()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I can’t get a grip on how Python handles Unicode in files… f =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply