Say I have generated the following binary file:
# generate file:
python -c 'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]' > mydata.bin
# get file size in bytes
stat -c '%s' mydata.bin
# 14
And say, I want to find the locations of all zeroes (0x00), using a grep-like syntax.
The best I can do so far is:
$ hexdump -v -e "1/1 \" %02x\n\"" mydata.bin | grep -n '00'
1: 00
2: 00
3: 00
4: 00
9: 00
12: 00
However, this implicitly converts each byte in the original binary file into a multi-byte ASCII representation, on which grep operates; not exactly the prime example of optimization 🙂
Is there something like a binary grep for Linux? Possibly, also, something that would support a regular expression-like syntax, but also for byte “characters” – that is, I could write something like ‘a(\x00*)b‘ and match ‘zero or more’ occurrences of byte 0 between bytes ‘a’ (97) and ‘b’ (98)?
EDIT: The context is that I’m working on a driver, where I capture 8-bit data; something goes wrong in the data, which can be kilobytes up to megabytes, and I’d like to check for particular signatures and where they occur. (so far, I’m working with kilobyte snippets, so optimization is not that important – but if I start getting some errors in megabyte long captures, and I need to analyze those, my guess is I would like something more optimized 🙂 . And especially, I’d like something where I can “grep” for a byte as a character – hexdump forces me to search strings per byte)
EDIT2: same question, different forum 🙂 grepping through a binary file for a sequence of bytes
EDIT3: Thanks to the answer by @tchrist, here is also an example with ‘grepping’ and matching, and displaying results (although not quite the same question as OP):
$ perl -ln0777e 'print unpack("H*",$1), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin
ca000000cb000000cc000000cd000000ce # Matched data (hex)
66357 # Offset (dec)
To have the matched data be grouped as one byte (two hex characters) each, then “H2 H2 H2 …” needs to be specified for as many bytes are there in the matched string; as my match ‘.....\0\0\0\xCC\0\0\0.....‘ covers 17 bytes, I can write ‘"H2"x17‘ in Perl. Each of these “H2” will return a separate variable (as in a list), so join also needs to be used to add spaces between them – eventually:
$ perl -ln0777e 'print join(" ", unpack("H2 "x17,$1)), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin
ca 00 00 00 cb 00 00 00 cc 00 00 00 cd 00 00 00 ce
66357
Well.. indeed Perl is very nice ‘binary grepping’ facility, I must admit 🙂 As long as one learns the syntax properly 🙂
One-Liner Input
Here’s the shorter one-liner version:
And here’s a slightly longer one-liner:
The way to connect those two one-liners is by uncompiling the first one’s program:
Programmed Input
If you want to put that in a file instead of a calling it from the command line, here’s a somewhat more explicit version:
And here’s the really long version:
One-Liner Output
BTW, to create the test input file, I didn’t use your big, long Python script; I just used this simple Perl one-liner:
You’ll find that Perl often winds up being 2-3 times shorter than Python to do the same job. And you don’t have to compromise on clarity; what could be simpler that the one-liner above?
Programmed Output
I know, I know. If you don’t already know the language, this might be clearer:
although this works, too:
as does
Although for those who like everything all rigorous and careful and all, this might be more what you would see:
TMTOWTDI
Perl supports more than one way to do things so that you can pick the one that you’re most comfortable with. If this were something I planned to check in as school or work project, I would certainly select the longer, more careful versions — or at least put a comment in the shell script if I were using the one-liners.
You can find documentation for Perl on your own system. Just type
etc at your shell prompt. If you want pretty-ish versions on the web instead, get the manpages for perl, perlrun, perlvar, and perlfunc from http://perldoc.perl.org.