I need fastest way to convert files from latin1 to utf-8 in python. The

Question

0

Asked: May 13, 20262026-05-13T22:23:39+00:00 2026-05-13T22:23:39+00:00

I need fastest way to convert files from latin1 to utf-8 in python. The

0

I need fastest way to convert files from latin1 to utf-8 in python. The files are large ~ 2G. ( I am moving DB data ). So far I have

import codecs
infile = codecs.open(tmpfile, 'r', encoding='latin1')
outfile = codecs.open(tmpfile1, 'w', encoding='utf-8')
for line in infile:
     outfile.write(line)
infile.close()
outfile.close()

but it is still slow. The conversion takes one fourth of the whole migration time.

I could also use a linux command line utility if it is faster than native python code.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T22:23:39+00:00

You could use blocks larger than one line, and do binary I/O — each might speed thinks up a bit (though on Linux binary I/O won’t, as it’s identical to text I/O):

 BLOCKSIZE = 1024*1024
 with open(tmpfile, 'rb') as inf:
   with open(tmpfile, 'wb') as ouf:
     while True:
       data = inf.read(BLOCKSIZE)
       if not data: break
       converted = data.decode('latin1').encode('utf-8')
       ouf.write(converted)

The byte-by-byte parsing implied in by-line reading, line-end conversion (not on Linux;-), and codecs.open-style encoding-decoding, should be part of what’s slowing you down. This approach is also portable (like yours is), since control-characters such as \n need no translation among these codecs anyway (in any OS).

This only works for input codecs that have no multibyte characters, but `latin1′ is one of those (it does not matter whether the output codec has such characters or not).

Try different block sizes to find the sweet spot performance-wise, depending on your disk, filesystem and available RAM.

Edit: changed code per @John’s comment, and clarified a conditon as per @gnibbler’s.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need fastest way to convert files from latin1 to utf-8 in python. The

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply