I have two binary files, "bigFile.bin" and "smallFile.bin".
The "bigFile.bin" contains "smallFile.bin".
Opening it in beyond compare confirms that.
I want to extract the smaller file form the bigger into a “result.bin” that equals "smallFile.bin".
I have two keywords- one for the start position (“Section”) and one for the end position (“Man”);
I tried the following:
byte[] bigFile = File.ReadAllBytes("bigFile.bin");
UTF8Encoding enc = new UTF8Encoding();
string text = enc.GetString(bigFile);
int startIndex = text.IndexOf("Section");
int endIndex = text.IndexOf("Man");
string smallFile = text.Substring(startIndex, endIndex - startIndex);
File.WriteAllBytes("result.bin",enc.GetBytes(smallFile));
I tried to compare the result file with the origin small file in beyond compare, which shows hex representation comparison.
nost of the bytes areequal -but some not.
For example in the new file I have 84 but in the old file I have EF BF BD sequence instead.
What can cause those differences? Where am I mistaken?
Since you are working with binary files, you should not use text-related functionality (which includes encodings etc). Work with byte-related methods instead.
Your current code could be converted to work by making it into something like this:
To find
startIndexandendIndexyou could even use your previous technique, but something like this would be more appropriate.However this would still be problematic because:
Streamrather than an array of bytesSo, what to do?
FileStreaminstead.StreamReaderaround theFileStreamand use it to find the markers for the start and end indexes. Even better, change your file format so that you don’t need to search for text.startIndexandlength, use stream functions to seek to the relevant part of your input stream and copylengthbytes to the output stream.