So I have the following string of data, which is being received through a

Question

0

Asked: May 20, 20262026-05-20T23:29:58+00:00 2026-05-20T23:29:58+00:00

So I have the following string of data, which is being received through a

0

So I have the following string of data, which is being received through a TCP winsock connection, and would like to do an advanced tokenization, into a vector of structs, where each struct represents one record.

std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"

struct table_t
{
    std::string key;
    std::string first;
    std::string last;
    std::string rank;
    std::additional;
};

Each record in the string is delimited by a carriage return. My attempt at splitting up the records, but not yet splitting up the fields:

    void tokenize(std::string& str, std::vector< string >records)
{
    // Skip delimiters at beginning.
    std::string::size_type lastPos = str.find_first_not_of("\n", 0);
    // Find first "non-delimiter".
    std::string::size_type pos     = str.find_first_of("\n", lastPos);
    while (std::string::npos != pos || std::string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        records.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of("\n", pos);
        // Find next "non-delimiter"
        pos = str.find_first_of("\n", lastPos);
    }
}

It seems totally unnecessary to repeat all of that code again to further tokenize each record via the colon (internal field separator) into the struct and push each struct into a vector. I’m sure there is a better way of doing this, or perhaps the design is in itself wrong.

Thank you for any help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T23:29:59+00:00

For breaking the string up into records, I’d use istringstream, if only
because that will simplify the changes later when I want to read from
a file. For tokenizing, the most obvious solution is boost::regex, so:

std::vector<table_t> parse( std::istream& input )
{
    std::vector<table_t> retval;
    std::string line;
    while ( std::getline( input, line ) ) {
        static boost::regex const pattern(
            "\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\)" );
        boost::smatch matched;
        if ( !regex_match( line, matched, pattern ) ) {
            //  Error handling...
        } else {
            retval.push_back(
                table_t( matched[1], matched[2], matched[3],
                         matched[4], matched[5] ) );
        }
    }
    return retval;
}

(I’ve assumed the logical constructor for table_t. Also: there’s a very
long tradition in C that names ending in _t are typedef’s, so you’re
probably better off finding some other convention.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So I have the following string of data, which is being received through a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply