The problem I’m trying to solve is optimising the input of some 3rd party code, which has commandline “program input_file output_file”. The 3rd party code handles the input_file with standard fopen, fseek, fread etc. I want to be able to use multiple input files, treating them as a single file as if they were concatenated in the order they’re supplied. I have the 3rd party code but want to modify it as little as possible. Currently am concatenating the files then calling program with the concatenated file as input, I’m trying to eliminate the concatenation as the files can be large and take time. Reading from stdin doesn’t do what I want, because the program writes stdin to a file to allow seeks.
The solution I’m working on is to accept the input_file commandline argument as many files concatenated (? delimited), and adding concat_stream.h to the start of the program source (after including stdio). concat_stream.h implements transparently treating multiple streams as one stream by intercepting the standard calls, and implementing the concatenated streams with some global arrays of the streams and accompanying data. Here’s a small portion of concat_stream.h as an example:
FILE * fopen_concat_streams (char * filename, char * mode )
{
if( strchr(filename, '?')!=NULL )/*do we want a concat_stream?*/
return concat_streams_init(filename, mode);/*setup concat_stream, return first stream as id*/
else
return fopen(filename, mode);/*standard library implementation*/
}
long int ftell_concat_streams( FILE * stream )
{
unsigned int index=is_stream_concat(stream);/*work out if stream refers to a concat_stream or regular stream*/
if(index!=CONCAT_STREAMS_MAX)/*is stream a concat_stream?*/
{
...
return answer;/*work out and return location in concat_stream*/
}
else
return ftell(stream);/*standard library implementation*/
}
#define fopen(x, y) fopen_concat_streams(x, y)
#define ftell(x) ftell_concat_streams(x)
My question is am I on the right track, and is there an easier way to do it? If there’s a library to sort this out for me I’ll use that instead, it seems like it should be a popular thing to do but I haven’t found anything so far. A totally different way to solve the initial problem would also be accepted, multiple streams as one is just my best guess at the easiest solution.
If you know the paths and sizes of all files, then this might work. What you try to achieve is to create a virtual file that is made up of all the individual parts.
You will need to create a data structure which contains the file handle and the offset (in the virtual file) of each file. Then you can search in this structure for the real file handle and calculate the correct offsets.
Problems to be aware of:
fread()callOther options:
If you don’t need
fseek(), you can try to teach the code to understand-as an alias forstdinand usecatto concatenate the files:cat file1 file2 file3 | program - outputWrite a file system using the FUSE API. That’s not as scary as it sounds in your case. That would allow you to keep the original code unchanged. Instead, you’d use FUSE to make the files appear like one huge file.