I have a document in Spanish I’d like to format using Python. Problem is

Question

0

Asked: June 3, 20262026-06-03T14:59:23+00:00 2026-06-03T14:59:23+00:00

I have a document in Spanish I’d like to format using Python. Problem is

0

I have a document in Spanish I’d like to format using Python. Problem is that in the output file, the accented characters are messed up, in this manner: \xc3\xad.
I succeeded in keeping the proper characters when I did some similar editing a while back, and although I’ve tried everything I did then and more, somehow it won’t work this time.
This is current version of the code:

# -*- coding: utf-8 -*- 

import re
import pickle

inputfile = open("input.txt").read()

pat = re.compile(r"(@.*\*)")

mylist = pat.findall(inputfile)

outputfile = open("output.txt", "w")

pickle.dump(mylist, outputfile)

outputfile.close()

I’m using Python 2.7 on Windows 7.
Can anyone see any obvious problems? The inputfile is encoded in utf-8, but I’ve tried encoding it latin-1 too. Thanks.

To clarify: My problem is that the latin characters doesn’t show up properly in the output.
It’s solved now, I just had to add this line as suggested by mata:

inputfile = inputfile.decode('utf-8')

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T14:59:24+00:00

it the input file is encoded in utf-8, then you should decode it first to work with it:

import re
import pickle

inputfile = open("input.txt").read()
inputfile = inputfile.decode('utf-8')

pat = re.compile(r"(@.*\*)")

mylist = pat.findall(inputfile)

outputfile = open("output.txt", "w")

pickle.dump(mylist, outputfile)

outputfile.close()

the so created file will contain a pickled version of your list. it you would rather hava a human readable file, then you might want to just use a plain file.
also a good way to deal with different encodings is using the codecs module:

import re
import codecs

with codecs.open("input.txt", "r", "utf-8") as infile:
    inp = infile.read()

pat = re.compile(r"(@.*\*)")
mylist = pat.findall(inp)

with codecs.open("output.txt", "w", "utf-8") as outfile:
     outfile.write("\n".join(mylist))

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a document in Spanish I’d like to format using Python. Problem is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply