i’ve been trying to mass-edit a bunch of text files to utf-8 in python

Question

0

Asked: May 21, 20262026-05-21T02:39:14+00:00 2026-05-21T02:39:14+00:00

i’ve been trying to mass-edit a bunch of text files to utf-8 in python

0

i’ve been trying to mass-edit a bunch of text files to utf-8 in python and this error keeps popping out. is there a way to replace them in some python scrips or bash commands?
i used the code:

writer = codecs.open(os.path.join(wrd, 'dict.en'), 'wtr', 'utf-8')
for infile in glob.glob(os.path.join(wrd,'*.txt')):
        print infile
        for line in open(infile):
                writer.write(line.encode('utf-8'))

and got these sorts of errors:

Traceback (most recent call last):
  File "dicting.py", line 30, in <module>
    writer.write(line2.encode('utf-8'))
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 216: unexpected code byte

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T02:39:14+00:00

OK, first point: your output file is set to automatically encode text written to it as utf-8, so don’t include an explicit encode('utf-8') method call when passing arguments to the write() method.

So the first thing to try is to simply use the following in your inner loop:

writer.write(line)

If that doesn’t work, then the problem is almost certainly the fact that, as others have noted, you aren’t decoding your input file properly.

Taking a wild guess and assuming that your input files are encoded in cp1252, you could try as a quick test the following in the inner loop:

for line in codecs.open(infile, 'r', 'cp1252'):
    writer.write(line)

Minor point: ‘wtr’ is a nonsensical mode string (as write access implies read access). Simplify it to either ‘wt’ or even just ‘w’.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i’ve been trying to mass-edit a bunch of text files to utf-8 in python

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply