Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7565393
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T14:08:36+00:00 2026-05-30T14:08:36+00:00

I think I understand (roughly) how recursive descent parsers (e.g. Scala’s Parser Combinators) work:

  • 0

I think I understand (roughly) how recursive descent parsers (e.g. Scala’s Parser Combinators) work: You parse the input string with one parser, and that parser calls other, smaller parsers for each “part” of the whole input, and so on, until you reach the low level parsers which directly generate the AST from fragments of the input string

I also think I understand how Lexing/Parsing works: you first run a lexer to break the whole input into a flat list of tokens, and you then run a parser to take the token list and generate an AST.

However, I do not understand is how the Lex/Parse strategy deals with cases where exactly how you tokenize something depends on the tokens that were tokenized earlier. For example, if I take a chunk of XML:

"<tag attr='moo' omg='wtf'>attr='moo' omg='wtf'</tag>"

A recursive descent parser may take this and break it down (each subsequent indent represents the decomposition of the parent string)

"<tag attr='moo' omg='wtf'>attr='moo' omg='wtf'</tag>" 
  -> "<tag attr='moo' omg='wtf'>"
       -> "<tag"
       -> "attr='moo'"
            -> "attr"
            -> "="
            -> "moo"
       -> "omg='wtf'"
            -> "omg"
            -> "="
            -> "wtf" 
       -> ">"
  -> "attr='moo' omg='wtf'"
  -> "</tag>"

And the small parsers which individually parse <tag, attr="moo", etc. would then construct a representation of an XML tag and add attributes to it.

However, how does a single-step Lex/Parse work? How does the Lexer know that the string after <tag and before > must be tokenized into separate attributes, while the string between > and </tag> does not need to be? Wouldn’t it need the Parser to tell it that the first string is within a tag body, and the second case is outside a tag body?

EDIT: Changed the example to make it clearer

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T14:08:37+00:00Added an answer on May 30, 2026 at 2:08 pm

    Typically the lexer will have a “mode” or “state” setting, which changes according to the input. For example, on seeing a < character, the mode would change to “tag” mode, and the lexer would tokenize appropriately until it sees a >. Then it would enter “contents” mode, and the lexer would return all of attr='moo' omg='wtf' as a single string. Programming language lexers, for example, handle string literals this way:

    string s1 = "y = x+5";
    

    The y = x+5 would never be handled as a mathematical expression and then turned back into a string. It’s recognized as a string literal, because the " changes the lexer mode.

    For languages like XML and HTML, it’s probably easier to build a custom parser than to use one of the parser generators like yacc, bison, or ANTLR. They have a different structure than programming languages, which are a better fit for the automatic tools.

    If your parser needs to turn a list of tokens back into the string it came from, that’s a sign that something is wrong in the design. You need to parse it a different way.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I think I understand how class-loading hierarchies work. (the JVM looks into the parent
I think I understand how loaders are supposed to work, etc, but I'm not
I think I understand unit testing. But I was wondering: is there a way
I think I understand the basic principals of T4 but I'm having a hard
I understand how JS is run and I think I understand most of the
I've been reading a lot about closures and I think I understand them, but
I have read the documentation on this and I think I understand. An AutoResetEvent
I must admin this is kind of funny even though I think I understand
I come from a background of MoM. I think I understand ESB conceptually. However,
I am trying to optimize my program. I think I understand the basics of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.