Python 2.6
Using Python string.replace() seems not working for UTF-16-LE file. I think of 2 ways:
- Find a Python module that can handle Unicode string manipulate.
- Convert the target Unicode file to ASCII, use string.replace(), then convert it back. But I am worry about this may cause loss data.
Can the community suggest me a good way to solve this? Thanks.
EDIT:
My code looks like this:
infile = open(inputfilename)
for s in infile:
outfile.write(s.replace(targetText, replaceText))
Looks like the for loop can parse the line correct. Did I make any mistakes here?
EDIT2:
I’ve read the Python Unicode tutorial and tried below code, and get it worked. However, just wondering if there’s any better way to do this. Can anyone help? Thanks.
infile = codecs.open(infilename,'r', encoding='utf-16-le')
newlines = []
for line in infile:
newlines.append(line.replace(originalText,replacementText))
outfile = codecs.open(outfilename, 'w', encoding='utf-16-le')
outfile.writelines(newlines)
Do I need to close infile or outfile?
You don’t have a Unicode file. There is no such thing (unless you are the author of NotePad, which conflates “Unicode” and “UTF-16LE”).
Please read the Python Unicode HOWTO and Joel on Unicode.
Update I’m glad the suggested reading helped you. Here’s a better version of your code:
It’s always a good habit to release resources (e.g. close files) immediately when you are finished with them. More importantly, with output files, the directory is usually not updated until you close the file.
Read up on the “with” statement to find out about even better practice with file handling.