The word file here refers to the shell file command, and not actual files. I want to determine whether a file is a, for example, video file (.mpg, .mkv, .avi). file is pretty good at returning image for image files, video for video files, and audio for audio files (and application/x-empty for some reason for text). My question is how reliable this is for identifying types. If I did a simple
file -ib deliverance.avi | grep video
would that work for all of the main video files outlined here?
The results from
fileare less than perfect, and it has more problems with some types of files than others. File basically just looks for particular pieces of binary data in predictable patterns to figure out filetypes.Unfortunately, in particular, some of the filetypes often used for video fall into this “problematic” category. The newer container formats like
.mp4and.mkvusually have several different MIME types that should properly depend on what type of data is being contained. For example, an.mp4could properly be identified asvideo/mp4,audio/mp4, orapplication/mp4depending on the content.In practice,
fileoften makes guesses that simply conform with common usage, and it may work perfectly well for you. For example, while I mentioned some theoretical difficulties with identifying Matroska files correctly,filebasically just assumes that any Matroska file is a video. On the other hand, the usage of the Ogg container is more evenly split between audio and video, and I believe the current version offilejust splits the difference, and identifies Ogg files asapplication/ogg, which wouldn’t fall into any of your categories.The one thing I can say with certainty is that you want the most up-to-date version of
fileyou can get your hands on. The “magic” files that contain the patterns to match against and the MIME types that will result from a match are updated fairly often to include newer filetypes like WebM, or just to improve accuracy for older types.