I am writing a little program that iterates through all files in a directory and searches for a substring.
It’s basically something like this:
s = File.ReadAllText(FileName)
If s.Contains("Find this substring") Then
MatchesFound += 1
End If
I also have a Regex version of this program, but still using File.ReadAllText() to read the files.
Should I be concerned with calling File.ReadAllText() on binary files?
I don’t mind getting a few false positives in the search results, but I don’t want my program to crash.
MSDN docs don’t show any exceptions for this method that result from not being able to read or interpret file data.
Your program won’t crash. If the file is too long, it might just take up lot of memory. ReadAllText releases file handle before returning to you. As such, your handles would get properly disposed.
Your string will just have text representation of the binary file. Most of it probably would be invalid characters. Framework internally uses unicode for string (UTF16).
Only thing you should be concerned about is extremely large files, e.g. a 4GB ISO file. If you have files that big in your directory then you should probably make better algorithm to make code efficient instead of blindly getting ReadAllText.
Also, before you read, you can check file size; and if its obvious that its a pure binary file (for ex. 100MB zip file); you can skip that and move to next.