I am getting some weird behaviour from protobuf binary file io. I am pre-processing a text corpus into a protobuf intermediary file. My serialization class looks as follows:
class pb_session_printer
{
public:
pb_session_printer(std::string & filename)
: out(filename.c_str(), std::fstream::out | std::fstream::trunc |
std::fstream|binary)
{}
void print_batch(std::vector<session> & pb_sv)
{
boost::lock_guard<boost::mutex> lock(m);
BOOST_FOREACH(session & s, pb_sv)
{
std::cout << out.tellg() << ":";
s.SerializeToOstream(&out);
out.flush();
std::cout << s.session_id() << ":" << s.action_size() << std::endl;
}
exit(0);
}
std::fstream out;
boost::mutex m;
};
A snippet of output looks like :
0:0:8
132:1:8
227:2:6
303:3:6
381:4:19
849:5:9
1028:6:2
1048:7:18
1333:8:28
2473:9:24
The first field shows that serialization is proceeding as normal.
When I run my loading program :
int main()
{
std::fstream in_file("out_file", std::fstream::in | std::ios::binary);
session s;
std::cout << in_file.tellg() << std::endl;
s.ParseFromIstream(&in_file);
std::cout << in_file.tellg() << std::endl;
std::cout << s.session_id() << std::endl;
s.ParseFromIstream(&in_file);
}
I get:
0
-1
111
libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type
"session" because it is missing required fields: session_id
session_id : 111 is an entry towards the end of the stream, I clearly don’t understand the semantics of binary-io facilities of the library. Please help.
If you write multiple protobuffers in a single file you will need to write the size of the protobuf + protobuffer and read them in seperately (so without
ParseFromIstreamas Cat Plus Plus mentioned). When you have read in the protobuffer you can parse it withParseFromArray.Your file would look size this (the spaces are just for readability):
size protobuf size protobuf size protobuf etc.