I have a file, which I read from test . This file is UTF-8.

Question

0

Asked: June 5, 20262026-06-05T19:51:34+00:00 2026-06-05T19:51:34+00:00

I have a file, which I read from test . This file is UTF-8.

0

I have a file, which I read from test. This file is UTF-8. It contains, in my simple example, only the Danish letter “Ø”.

I then have a Python script, which reads this file, and in this example, just prints every line.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import codecs
import sys

filename = sys.argv[1]

f = codecs.open(filename, 'r', 'utf-8')

for lines in f:
  print lines

Call this parse.py. Now when I run ./parse.py test in my terminal I get the following output:

Ø

Calling instead ./parse.py test | less gives me:

Traceback (most recent call last):
  File "./test.py", line 12, in <module>
    print lines
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd8' in position 11: ordinal not in range(128)

I am certain my test file is ‘UTF-8’:

$ file -I test
test: text/plain; charset=utf-8

As well as my $LC_TYPE being ‘UTF-8’

What am I doing wrong? How do I get it to work, so that I can pass the output of parse.py to the next command?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T19:51:36+00:00

This is probably a problem with less, see this article for some tips. Maybe changing the configuration of less will help.

If your system supports the UTF-8 encoding of Unicode for non-ASCII text, as many modern systems do, you should either set your locale to something that includes the string “UTF-8” or “UTF8” (either uppercase or lowercase is ok), or set LESSCHARSET to “utf-8”.

Ok, this wasn’t the problem…so updating the answer based on the comments.
Needed to encode the string before print. This article gives the reason, summed up: python needs to be told how to encode the unicode.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a file, which I read from test . This file is UTF-8.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply