Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8584441
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T21:46:30+00:00 2026-06-11T21:46:30+00:00

How can you use stringstream to tokenize a line that looks like this. [label]

  • 0

How can you use stringstream to tokenize a line that looks like this.

[label] opcode [arg1] [,arg2]

The label may not always be there but if it isn’t, there will be a white space. The opcode is always there and there is a space or tab in between opcode and arg1. Then there is no whitespace in between arg1 and arg2 but it is split by a comma.

Also, some blank lines will have white space on them so they need to be discarded.
‘#’ is a comment

So for instance:

#Sample Input
TOP  NoP
     L   2,1
VAL  INT  0

This is just an example of the text file I’ll be reading in from. So in label for line one would be TOP and opcode would = NOP with no arguments being passed.

I’ve been working on it but I need a simpler way to tokenize and from what I’ve seen, stringstream seems to be the one I’d like to use so if anyone can tell me sort of how to do this, I’d really appreciate it.

I’ve been racking my brain on how to do this and just to show you that I’m not just asking without working, here is my current code:

int counter = 0;
int i = 0;
int j = 0;
int p = 0;

while (getline(myFile, line, '\n'))
{


    if (line[0] == '#')
    {
        continue;
    }

    if (line.length() == 0)
    {
        continue;
    }

    if (line.empty())
    {
        continue;
    }

    // If the first letter isn't a tab or space then it's a label

    if (line[0] != '\t' && line[0] != ' ')
    {

        string delimeters = "\t ";

        int current;
        int next = -1;


        current = next + 1;
        next = line.find_first_of( delimeters, current);
        label = line.substr( current, next - current );

        Symtablelab[i] = label;
        Symtablepos[i] = counter;

        if(next>0)
        {
            current = next + 1;
            next = line.find_first_of(delimeters, current);
            opcode = line.substr(current, next - current);


            if (opcode != "WORDS" && opcode != "INT")
            {
                counter += 3;
            }

            if (opcode == "INT")
            {
                counter++;
            }

            if (next > 0)
            {
                delimeters = ", \n\t";
                current = next + 1;
                next = line.find_first_of(delimeters, current);
                arg1 = line.substr(current, next-current);

                if (opcode == "WORDS")
                {
                    counter += atoi(arg1.c_str());
                }
            }

            if (next > 0)
            {
                delimeters ="\n";
                current = next +1;
                next = line.find_first_of(delimeters,current);
                arg2 = line.substr(current, next-current);

            }
        }

        i++;

    }

    // If the first character is a tab or space then there is no label and we just need to get a counter
    if (line[0] == '\t' || line[0] == ' ')
    {
        string delimeters = "\t \n";
        int current;
        int next = -1;
        current = next + 1;
        next = line.find_first_of( delimeters, current);
        label = line.substr( current, next - current );

    if(next>=0)
        {
            current = next + 1;
            next = line.find_first_of(delimeters, current);
            opcode = line.substr(current, next - current);

            if (opcode == "\t" || opcode =="\n"|| opcode ==" ")
            {
                continue;
            }

            if (opcode != "WORDS" && opcode != "INT")
            {
                counter += 3;
            }

            if (opcode == "INT")
            {
                counter++;
            }


            if (next > 0)
            {
                delimeters = ", \n\t";
                current = next + 1;
                next = line.find_first_of(delimeters, current);
                arg1 = line.substr(current, next-current);

                if (opcode == "WORDS")
                {
                    counter += atoi(arg1.c_str());
                }

            }



            if (next > 0)
            {
                delimeters ="\n\t ";
                current = next +1;
                next = line.find_first_of(delimeters,current);
                arg2 = line.substr(current, next-current);

            }
        }

    }
}

myFile.clear();
myFile.seekg(0, ios::beg);

while(getline(myFile, line))
{
    if (line.empty())
    {
        continue;
    }

    if (line[0] == '#')
    {
        continue;
    }

    if (line.length() == 0)
    {
        continue;
    }



    // If the first letter isn't a tab or space then it's a label

    if (line[0] != '\t' && line[0] != ' ')
    {

        string delimeters = "\t ";

        int current;
        int next = -1;


        current = next + 1;
        next = line.find_first_of( delimeters, current);
        label = line.substr( current, next - current );


        if(next>0)
        {
            current = next + 1;
            next = line.find_first_of(delimeters, current);
            opcode = line.substr(current, next - current);



            if (next > 0)
            {
                delimeters = ", \n\t";
                current = next + 1;
                next = line.find_first_of(delimeters, current);
                arg1 = line.substr(current, next-current);

            }

            if (next > 0)
            {
                delimeters ="\n\t ";
                current = next +1;
                next = line.find_first_of(delimeters,current);
                arg2 = line.substr(current, next-current);

            }
        }

        if (opcode == "INT")
        {
            memory[p] = arg1;
            p++;
            continue;
        }

        if (opcode == "HALT" || opcode == "NOP" || opcode == "P_REGS")
        {
            memory[p] = opcode;
            p+=3;
            continue;
        }

        if(opcode == "J" || opcode =="JEQR" || opcode == "JNE" || opcode == "JNER" || opcode == "JLT" || opcode == "JLTR" || opcode == "JGT" || opcode == "JGTR" || opcode == "JLE" || opcode == "JLER" || opcode == "JGE" || opcode == "JGER" || opcode == "JR")
        {
            memory[p] = opcode;
            memory[p+1] = arg1;
            p+=3;
            continue;
        }

        if (opcode == "WORDS")
        {
            int l = atoi(arg1.c_str());
            for (int k = 0; k <= l; k++)
            {
                memory[p+k] = "0";
            }

            p+=l;
            continue;
        }

        else
        {
            memory[p] = opcode;
            memory[p+1] = arg1;
            memory[p+2] = arg2;
            p+=3;
        }

    }

    // If the first character is a tab or space then there is no label and we just need to get a counter        


    if (line[0] == '\t' || line[0] == ' ')
    {
        string delimeters = "\t ";
        int current;
        int next = -1;
        current = next + 1;
        next = line.find_first_of( delimeters, current);
        label = line.substr( current, next - current );

    if(next>=0)
        {
            current = next + 1;
            next = line.find_first_of(delimeters, current);
            opcode = line.substr(current, next - current);

            if (opcode == "\t" || opcode =="\n"|| opcode ==" "|| opcode == "")
            {
                continue;
            }



            if (next > 0)
            {
                delimeters = ", \n\t";
                current = next + 1;
                next = line.find_first_of(delimeters, current);
                arg1 = line.substr(current, next-current);

            }



            if (next > 0)
            {
                delimeters ="\n\t ";
                current = next +1;
                next = line.find_first_of(delimeters,current);
                arg2 = line.substr(current, next-current);

            }
        }

        if (opcode == "INT")
        {
            memory[p] = arg1;
            p++;
            continue;
        }

        if (opcode == "HALT" || opcode == "NOP" || opcode == "P_REGS")
        {
            memory[p] = opcode;
            p+=3;
            continue;
        }

        if(opcode == "J" || opcode =="JEQR" || opcode == "JNE" || opcode == "JNER" || opcode == "JLT" || opcode == "JLTR" || opcode == "JGT" || opcode == "JGTR" || opcode == "JLE" || opcode == "JLER" || opcode == "JGE" || opcode == "JGER" || opcode == "JR")
        {
            memory[p] = opcode;
            memory[p+1] = arg1;
            p+=3;
            continue;
        }

        if (opcode == "WORDS")
        {
            int l = atoi(arg1.c_str());
            for (int k = 0; k <= l; k++)
            {
                memory[p+k] = "0";
            }

            p+=l;

            continue;
        }

        else
        {
            memory[p] = opcode;
            memory[p+1] = arg1;
            memory[p+2] = arg2;
            p+=3;
        }
    }
}

I would obviously like to make this much much better so any help would be greatly appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T21:46:31+00:00Added an answer on June 11, 2026 at 9:46 pm

    Before you go mad with maintaining those huge if statemenets or trying to learn Boost Spirit, let’s try to write a very simple parser. This is a bit of a long post,
    and doesn’t get directly to the point so please bear with me.

    First, we need a grammar, which seems to be dead simple:

        line
              label(optional)   opcode   argument-list(optional)
    
        argument-list
              argument
              argument, argument-list
    

    In english: A line of code consists of an optional label, an opcode and an optional argument list. Arguments list is either a single argument (an integer) or an argument followed by a separator (comma) and another argument list.

    Let’s first define two datastructures. Labels are supposed to be unique (right?), so we’ll have a set of strings so we can easily look them up at any time and possibly report an error if we find a duplicate label. The next one is a map of strings to size_t, which acts as a symbol table of valid opcodes together with expected number of arguments for each opcode.

    std::set<std::string> labels;
    std::map<std::string, size_t> symbol_table = {
        { "INT", 1},
        { "NOP", 0},
        { "L",   2}
    };
    

    I don’t know what exactly is memory in your code, but your way of calculating offsets to figure where to put arguments seems unneccesarily complicated. Let’s define a data structure that can elegantly hold a line of code instead. I’d do something like this:

    typedef std::vector<int> arg_list;
    
    struct code_line {
        code_line() : label(), opcode(), args() {}
        std::string  label;      // labels are optional, so an empty string
                                 // will mean absence of label
        std::string  opcode;     // opcode, doh
        arg_list     args;       // variable number of arguments, it can be empty, too.
                                 // It needs to match with opcode, we'll deal with
                                 // that later
    };
    

    A syntax error is kind of an exceptional circumstance that’s not easily recoverable, so let’s deal with them by throwing exceptions. Our simple exception class can look like this:

    struct syntax_error {
        syntax_error(std::string m) : msg(m) { }
        std::string msg;
    };
    

    Tokenizing, lexing and parsing are usualy separated tasks. But I guess for this simple example, we can combine tokenizer and lexer in one class. We already know the elements our grammer is made of, so let’s write a class that’ll take input as text and extract grammar elements from it. The interface could look like this:

    class token_stream {
        std::istringstream stream; // stringstream for input
        std::string buffer;        // a buffer for a token, more on this later
    public:
        token_stream(std::string str) : stream(str), buffer() { }
    
        // these methods are self-explanatory
        std::string get_label();
        std::string get_opcode();
        arg_list get_arglist();
    
        // we're taking a kind of top-down approach with this,
        // so let's forget about implementations for now
    };
    

    And the work horse, a function that tries to makes sense of tokens and returns a code_line struct if everything goes fine:

    code_line parse(std::string line)
    {
        code_line temp;
        token_stream stream(line);
    
        // Again, self-explanatory, get a label, opcode and argument list from
        // token stream.
    
        temp.label = stream.get_label();
        temp.opcode = stream.get_opcode();
        temp.args = stream.get_arglist();
    
        // Everything went fine so far, remember we said we'd be throwing exceptions
        // in case of syntax errors.
    
        // Now we can check if we got the correct number of arguments for the given opcode:
    
        if (symbol_table[temp.opcode] != temp.args.size()) {
            throw syntax_error("Wrong number of parameters.");
        }
    
        // The last thing, if there's a label in the line, we insert it in the table.
        // We couldn't do that inside the get_label method, because at that time
        // we didn't yet know if the rest of the line is sintactically valid and a
        // exception thrown would have left us with a "dangling" label in the table.
    
        if (!temp.label.empty()) labels.insert(temp.label);
    
        return temp;
    }
    

    And here’s how we might use all this:

    int main()
    {
        std::string line;
        std::vector<code_line> code;
    
        while (std::getline(std::cin, line)) {
    
            // empty line or a comment, ignore it
            if (line.empty() || line[0] = '#') continue;
    
            try {
                code.push_back(parse(line));
            } catch (syntax_error& e) {
                std::cout << e.msg << '\n';
    
                // Give up, try again, log... up to you.
            }
        }
    }
    

    If the input was succesfuly parsed, we now got a vector of valid lines with all the info (labels, number of arguments) and can do pretty much anything we like with it. This code will be much easier to mantain and extend than yours, IMO. If you need to introduce a new opcode, for example, just make another entry in the map (symbol_table). How’s that compared to your ifstatements? 🙂

    The only thing left is the actual implementation of the token_streams methods. Here’s how I did it for get_label:

    std::string token_stream::get_label()
    {
        std::string temp;
    
        // Unless the stream is empty (and it shouldn't be, we checked that in main),
        // operator>> for std::string is unlikely to fail. It doesn't hurt to be robust
        // with error checking, though
    
        if (!(stream >> temp)) throw ("Fatal error, empty line, bad stream?");
    
        // Ok, we got something. First we should check if the string consists of valid
        // characters - you probably don't want punctuation characters and such in a label.
        // I leave this part out for simplicity.
    
        // Since labels are optional, we need to check if the token is an opcode.
        // If that's the case, we return an empty (no) label.
    
        if (symbol_table.find(temp) != symbol_table.end()) {
            buffer = temp;
            return "";
        }
    
        // Note that above is where that `buffer` member of token_stream class got used.
        // If the token was an opcode, we needed to save it so get_opcode method can make
        // use of it. The other option would be to put the string back in the underlying 
        // stringstream, but that's more work and more code. This way, get_opcode needs   
        // to check if there's anything in buffer and use it, or otherwise extract from
        // the stringstream normally.
    
        // Check if the label was used before:
    
        if (labels.count(temp))
            throw syntax_error("Label already used.");
    
        return temp;
    }
    

    And that’s it. I leave the rest of the implementation as an exercise for you. Hope it helped. 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I want to derive a stringstream so that I can use the operator<< to
I use Boost.Serialization to serialize a std::map. The code looks like this void Dictionary::serialize(std::string
I can use this maven plugin maven-jaxb-plugin to generate Java Classes from XSD file.
We can use solr range query like: http://localhost:8983/solr/select?q=queryStr&fq=x:[10 TO 100] AND y:[20 TO 300]
You can use the Filter property of a BindingSource to do SQL like filtering.
For a project, I'd like to use stringstream to carry on data. To achieve
Like I have a stringstream variable contains abc gg rrr ff When I use
I have a stringstream where it has many strings inside like this: <A style=FONT-WEIGHT:
Possible Duplicate: C++ Using stringstream after << as parameter I've a function like this
I have an assignment that my teacher has told us to use this in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.