One obvious error is that you're expecting to return the…

Question

0

Asked: May 15, 20262026-05-15T09:43:52+00:00 2026-05-15T09:43:52+00:00

I’ve set up a script that basically does a large-scale find-and-replace on a plain

0

I’ve set up a script that basically does a large-scale find-and-replace on a plain text document.

At the moment it works fine with ASCII, UTF-8, and UTF-16 (and possibly others, but I’ve only tested these three) encoded documents so long as the encoding is specified inside the script (the example code below specifies UTF-16).

Is there a way to make the script automatically detect which of these character encodings is being used in the input file and automatically set the character encoding of the output file the same as the encoding used on the input file?

findreplace = [
('term1', 'term2'),
]    

inF = open(infile,'rb')
    s=unicode(inF.read(),'utf-16')
    inF.close()

    for couple in findreplace:
        outtext=s.replace(couple[0],couple[1])
        s=outtext

    outF = open(outFile,'wb')
    outF.write(outtext.encode('utf-16'))
    outF.close()

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T09:43:53+00:00

From the link J.F. Sebastian posted: try chardet.

Keep in mind that in general it’s impossible to detect the character encoding of every input file 100% reliably – in other words, there are possible input files which could be interpreted equally well as any of several character encodings, and there may be no way to tell which one is actually being used. chardet uses some heuristic methods and gives you a confidence level indicating how “sure” it is that the character encoding it tells you is actually correct.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions