I’ve been loading a lot of binary files recently using C/C++, and I’m bothered by how inelegant it can be. Either I get a lot of code that looks like this (I’ve since moved on):
uint32_t type, k;
uint32_t *variable;
FILE *f;
if (!fread(&type, 4, 1, f))
goto boundsError;
if (!fread(&k, 4, 1, f))
goto boundsError;
variable = malloc(4 * k);
if (!fread(variable, 4 * k, 1, f))
goto boundsError;
Or, I define a local, packed struct so that I can read in constant-sized blocks easier. It seems to me, however, that for such a simple problem—that is, reading a specified file into memory—could be done more efficiently and in more of a readable manner. Does anyone have any tips/tricks etc? I’d like to clarify that I’m not looking for a library or something to handle this; I might be tempted if I were designing my own file and had to change the file spec a lot, but for now I’m just looking for stylistic answers.
Also, some of you might suggest mmap—I love mmap! I use it a lot, but the problem with it is that it leads to nasty code for handling unaligned data types, which doesn’t really exist when using stdio. In the end, I’d be writing stdio-like wrapper functions for reading from memory.
Thanks!
EDIT: I should also clarify that I can’t change file formats—there’s a binary file that I have to read; I can’t request the data in another format.
If you want to de-serialize binary data, one option is to define serialization macros for the structs that you want to use. This is a lot easier in C++ with template functions and streams. (boost::serialization is a non-intrusive serialization library, but if you want to go intrusive, you can make it more elegant)
Simple C macros:
Usage:
And, yes, serialization code is some of the most boring and brain-dead code to write. If you can, describe your data structures using metadata, and generate the code mechanically instead. There are tools and libs to help with this, or you can roll your own in Perl or Python or PowerShell or whatever.