How can I tell if a file is binary (non-text) in Python?
I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.
I know I could use grep -I, but I am doing more with the data than what grep allows for.
In the past, I would have just searched for characters greater than 0x7f, but utf8 and the like, make that impossible on modern systems. Ideally, the solution would be fast.
You can also use the mimetypes module:
It’s fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.