I am using “ExuberantCtags” also known as “ctags -e”, also known as just “etags”
and I am trying to understand the TAGS file format which is generated by the etags command, in particular I want to understand line #2 of the TAGS file.
Wikipedia says that line #2 is described like this:
{src_file},{size_of_tag_definition_data_in_bytes}
In practical terms though TAGS file line:2 for “foo.c” looks like this
foo.c,1683
My quandary is how exactly does it find this number: 1683
I know it is the size of the “tag_definition” so what I want to know is what is
the “tag_definition”?
I have tried looking through the ctags source code, but perhaps someone better at C than me will have more success figuring this out.
Thanks!
EDIT #2:
^L^J
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
float bar () {^?bar^A7,59^J
int main() {^?main^A11,91^J
Alright, so if I understand correctly, “79” refers to the number of bytes in the TAGS file from after 79 down to and including “91^J”.
Makes perfect sense.
Now the numbers 20, 59, 91 in this example wikipedia says refer to the {byte_offset}
What is the {byte_offset} offset from?
Thanks for all the help Ken!
It’s the number of bytes of tag data following the newline after the number.
Edit: It also doesn’t include the ^L character between file tag data. Remember etags comes from a time long ago where reading a 500KB file was an expensive operation. 😉
Here’s a complete tags file. I’m showing it two ways, the first with control characters as ^X and no invisible characters. The end-of-line characters implicit in your example are ^J here:
Here’s the same file displayed in hex:
There are two sets of tag data in this example: 45 bytes of data for hello.cc and 15 bytes for hello.h.
The hello.cc data starts on the line following “hello.cc,45^J” and runs for 45 bytes–this also happens to be complete lines. The reason why bytes are given is so code reading the file can just allocate room for a 45 byte string and read 45 bytes. The “^L^J” line is after the 45 bytes of tag data. You use this as a marker that there are more files remaining and also to verify that the file is properly formatted.
The hello.h data starts on the line following “hello.h,15^J” and runs for 15 bytes.