I have a ~700 MB binary file (non-text data); what I would like to

Question

0

Asked: May 21, 20262026-05-21T08:45:22+00:00 2026-05-21T08:45:22+00:00

I have a ~700 MB binary file (non-text data); what I would like to

0

I have a ~700 MB binary file (non-text data); what I would like to do is search for a specific pattern of bytes that occurs in random locations throughout the file. e.g. 0x? 0x? 0x55 0x? 0x? 0x55 0x? 0x? 0x55 0x? 0x? 0x55 and so on for 50 or so bytes in sequence. The pattern I’d be searching for would be a sequence two random bytes with 0x55 occurring every two bytes.

That is, search for tables stored in the file with 0x55 being the delimiter, and then save the data contained in the tables or otherwise manipulate it.

Would the best option be simply going through every individual byte one at a time, and then looking ahead two bytes to see if the value is 0x55, and if it is, then looking ahead again and again to confirm that a table exists in that location?

Load the whole thing? fseek? Buffer chunks, searching those one byte at a time?

What would be the best way of looking through this large file, and finding the pattern, using C or C++?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T08:45:23+00:00

What ultimately worked for me was a hybrid between the Boyer-Moore-Horspool algorithm (suggested by Jerry Coffin) and my own algorithm based on the structure of the tables and the data being stored.

Basically, the BMH algorithm caught most of the things I was looking for. The obvious stuff.

But some tables did turn out to have odd formatting, and I had to implement a semi-intelligent search that would look at the data following each 0x55, and figure out whether or not it was it was likely to be good data, or just random junk.

Oddly enough, I ended up implementing it in PHP rather than C++, and dumping the results right into a MySQL database for querying. The search process only took around 5 minutes or less, and the results were largely good. I did end up with a lot of junk data, but it caught everything that I needed it to, and (as far as I’m aware) did not leave any good data behind.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a ~700 MB binary file (non-text data); what I would like to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply