Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8212695
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T10:50:50+00:00 2026-06-07T10:50:50+00:00

Every time I write a simple lexer and parser, I stumble upon the same

  • 0

Every time I write a simple lexer and parser, I stumble upon the same question: how should the lexer and the parser communicate? I see four different approaches:

  1. The lexer eagerly converts the entire input string into a vector of tokens. Once this is done, the vector is fed to the parser which converts it into a tree. This is by far the simplest solution to implement, but since all tokens are stored in memory, it wastes a lot of space.

  2. Each time the lexer finds a token, it invokes a function on the parser, passing the current token. In my experience, this only works if the parser can naturally be implemented as a state machine like LALR parsers. By contrast, I don’t think it would work at all for recursive descent parsers.

  3. Each time the parser needs a token, it asks the lexer for the next one. This is very easy to implement in C# due to the yield keyword, but quite hard in C++ which doesn’t have it.

  4. The lexer and parser communicate through an asynchronous queue. This is commonly known under the title “producer/consumer”, and it should simplify the communication between the lexer and the parser a lot. Does it also outperform the other solutions on multicores? Or is lexing too trivial?

Is my analysis sound? Are there other approaches I haven’t thought of? What is used in real-world compilers? It would be really cool if compiler writers like Eric Lippert could shed some light on this issue.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T10:50:52+00:00Added an answer on June 7, 2026 at 10:50 am

    While I wouldn’t classify much of the above as incorrect, I do believe several items are misleading.

    1. Lexing an entire input before running a parser has many advantages over other options. Implementations vary, but in general the memory required for this operation is not a problem, especially when you consider the type of information that you’d like to have available for reporting compilation errors.

      • Benefits
        • Potentially more information available during error reporting.
        • Languages written in a way that allows lexing to occur before parsing are easier to specify and write compilers for.
      • Drawbacks
        • Some languages require context-sensitive lexers that simply cannot operate before the parsing phase.

      Language implementation note: This is my preferred strategy, as it results in separable code and is best suited for translation to implementing an IDE for the language.

      Parser implementation note: I experimented with ANTLR v3 regarding memory overhead with this strategy. The C target uses over 130 bytes per token, and the Java target uses around 44 bytes per token. With a modified C# target, I showed it’s possible to fully represent the tokenized input with only 8 bytes per token, making this strategy practical for even quite large source files.

      Language design note: I encourage people designing a new language to do so in a way that allows this parsing strategy, whether or not they end up choosing it for their reference compiler.

    2. It appears you’ve described a “push” version of what I generally see described as a “pull” parser like you have in #3. My work emphasis has always been on LL parsing, so this wasn’t really an option for me. I would be surprised if there are benefits to this over #3, but cannot rule them out.

    3. The most misleading part of this is the statement about C++. Proper use of iterators in C++ makes it exceptionally well suited to this type of behavior.

    4. A queue seems like a rehash of #3 with a middleman. While abstracting independent operations has many advantages in areas like modular software development, a lexer/parser pair for a distributable product offering is highly performance-sensitive, and this type of abstraction removes the ability to do certain types of optimization regarding data structure and memory layout. I would encourage the use of option #3 over this.

      As an additional note on multi-core parsing: The initial lexer/parser phases of compilation for a single compilation unit generally cannot be parallelized, nor do they need to be considering how easy it is to simply run parallel compilation tasks on different compilation units (e.g. one lexer/parser operation on each source file, parallelizing across the source files but only using a single thread for any given file).

    Regarding other options: For a compiler intended for widespread use (commercial or otherwise), generally implementers choose a parsing strategy and implementation which provides the best performance under the constraints of the target language. Some languages (e.g. Go) can be parsed exceptionally quickly with a simple LR parsing strategy, and using a “more powerful” parsing strategy (read: unnecessary features) would only serve to slow things down. Other languages (e.g. C++) are extremely challenging or impossible to parse with typical algorithms, so slower but more powerful/flexible parsers are employed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I decided to write a simple RSA encryption implementation in Python, but every time
Every time I write a program of the form below using LINQ to SQL,
I need to have a newline every time I write to a file in
I have a RichTextBox that I write a string to every time I click
If we must connect the db (redis)every time we need to write to or
every time I open my Solution in Visual Studio it tries to communicate and
This is probably a pretty simple question. In C# I'm trying to write a
The MySQL table is simple: id | name | create_time Every time the user
I have a little problem with my text fonts every time I write a
It seems every time I go to write a recursive function I end up

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.