So I have the following string of data, which is being received through a TCP winsock connection, and would like to do an advanced tokenization, into a vector of structs, where each struct represents one record.
std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"
struct table_t
{
std::string key;
std::string first;
std::string last;
std::string rank;
std::additional;
};
Each record in the string is delimited by a carriage return. My attempt at splitting up the records, but not yet splitting up the fields:
void tokenize(std::string& str, std::vector< string >records)
{
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of("\n", 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of("\n", lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
records.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of("\n", pos);
// Find next "non-delimiter"
pos = str.find_first_of("\n", lastPos);
}
}
It seems totally unnecessary to repeat all of that code again to further tokenize each record via the colon (internal field separator) into the struct and push each struct into a vector. I’m sure there is a better way of doing this, or perhaps the design is in itself wrong.
Thank you for any help.
For breaking the string up into records, I’d use istringstream, if only
because that will simplify the changes later when I want to read from
a file. For tokenizing, the most obvious solution is boost::regex, so:
(I’ve assumed the logical constructor for table_t. Also: there’s a very
long tradition in C that names ending in _t are typedef’s, so you’re
probably better off finding some other convention.)