I have a python code which reads many files. but some files are extremely

Question

0

Asked: May 13, 20262026-05-13T10:42:56+00:00 2026-05-13T10:42:56+00:00

I have a python code which reads many files. but some files are extremely

0

I have a python code which reads many files.
but some files are extremely large due to which i have errors coming in other codes.
i want a way in which i can check for the character count of the files so that i avoid reading those extremely large files.
Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T10:42:57+00:00

os.stat(filepath).st_size

Assuming by ‘characters’ you mean bytes. ETA:

i need total character count just like what the command ‘wc filename’ gives me unix

In which mode? wc on it own will give you a line, word and byte count (same as stat), not Unicode characters.

There is a switch -m which will use the locale’s current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn’t make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:

import sys, codecs

def getUnicodeFileLength(filepath, charset= None):
    if charset is None:
        charset= sys.getfilesystemencoding()
    readerclass= codecs.getReader(charset)
    reader= readerclass(open(filepath, 'rb'), 'replace')
    nchar= 0
    while True:
        chars= reader.read(1024*32)  # arbitrary chunk size
        if chars=='':
            break
        nchar+= len(chars)
    reader.close()
    return nchar

sys.getfilesystemencoding() gets the locale encoding, reproducing what wc -m does. If you know the encoding yourself (eg. ‘utf-8’) then pass that in instead.

I don’t think you want to do this.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a python code which reads many files. but some files are extremely

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply