Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 717043
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T05:22:02+00:00 2026-05-14T05:22:02+00:00

For all you compiler gurus, I wanna write a recursive descent parser and I

  • 0

For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don’t tell me to read the dragon book, i’ll come around to that eventually.

I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple language, say CSS. And I wanna do this right.

This will probably end up being a series of questions but right now I’m starting with a lexer. Tokenization rules for CSS can be found here.

I find my self writing code like this (hopefully you can infer the rest from this snippet):

public CssToken ReadNext()
{
    int val;
    while ((val = _reader.Read()) != -1)
    {
        var c = (char)val;
        switch (_stack.Top)
        {
            case ParserState.Init:
                if (c == ' ')
                {
                    continue; // ignore
                }
                else if (c == '.')
                {
                    _stack.Transition(ParserState.SubIdent, ParserState.Init);
                }
                break;

            case ParserState.SubIdent:
                if (c == '-')
                {
                    _token.Append(c);
                }
                _stack.Transition(ParserState.SubNMBegin);
                break;

What is this called? and how far off am I from something reasonable well understood? I’m trying to balance something which is fair in terms of efficiency and easy to work with, using a stack to implement some kind of state machine is working quite well, but I’m unsure how to continue like this.

What I have is an input stream, from which I can read 1 character at a time. I don’t do any look a head right now, I just read the character then depending on the current state try to do something with that.

I’d really like to get into the mind set of writing reusable snippets of code. This Transition method is currently means to do that, it will pop the current state of the stack and then push the arguments in reverse order. That way, when I write Transition(ParserState.SubIdent, ParserState.Init) it will “call” a sub routine SubIdent which will, when complete, return to the Init state.

The parser will be implemented in much the same way, currently, having everything in a single big method like this allows me to easily return a token when I found one, but it also forces me to keep everything in one single big method. Is there a nice way to split these tokenization rules into separate methods?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T05:22:03+00:00Added an answer on May 14, 2026 at 5:22 am

    What you’re writing is called a pushdown automaton. This is usually more power than you need to write a lexer, it’s certainly excessive if you’re writing a lexer for a modern language like CSS. A recursive descent parser is close in power to a pushdown automaton, but recursive descent parsers are much easier to write and to understand. Most parser generators generate pushdown automatons.

    Lexers are almost always written as finite state machines, i.e., like your code except get rid of the “stack” object. Finite state machines are closely related to regular expressions (actually, they’re provably equivalent to one another). When designing such a parser, one usually starts with the regular expressions and uses them to create a deterministic finite automaton, with some extra code in the transitions to record the beginning and end of each token.

    There are tools to do this. The lex tool and its descendants are well known and have been translated into many languages. The ANTLR toolchain also has a lexer component. My preferred tool is ragel on platforms that support it. There is little benefit to writing a lexer by hand most of the time, and the code generated by these tools will probably be faster and more reliable.

    If you do want to write your own lexer by hand, good ones often look something like this:

    function readToken() // note: returns only one token each time
        while !eof
            c = peekChar()
            if c in A-Za-z
                return readIdentifier()
            else if c in 0-9
                return readInteger()
            else if c in ' \n\r\t\v\f'
                nextChar()
            ...
        return EOF
    
    function readIdentifier()
        ident = ""
        while !eof
            c = nextChar()
            if c in A-Za-z0-9
                ident.append(c)
            else
                return Token(Identifier, ident)
                // or maybe...
                return Identifier(ident)
    

    Then you can write your parser as a recursive descent parser. Don’t try to combine lexer / parser stages into one, it leads to a total mess of code. (According to the Parsec author, it’s slower, too).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.