Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7669503
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T15:36:30+00:00 2026-05-31T15:36:30+00:00

In modern compiler implementation in Java by Andrew Appel he claims in an exercise

  • 0

In “modern compiler implementation in Java” by Andrew Appel he claims in an exercise that:

Lex has a lookahead operator / so that the regular expression abc/def matches abc only when followed by def (but def is not part of the matched string, and will be part of the next token(s)). Aho et al. [1986] describe, and Lex [Lesk 1975] uses, an incorrect algorithm for implementing lookahead (it fails on (a|ab)/ba with input aba, matching ab where it should match a). Flex [Paxson 1995] uses a better mechanism that works correctly for (a|ab)/ba but fails (with a warning message on zx*/xy*. Design a better lookahead mechanism.

Does anyone know the solution to what he is describing?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T15:36:31+00:00Added an answer on May 31, 2026 at 3:36 pm

    “Does not work how I think it should” and “incorrect” are, not always the same thing. Given the input

    aba
    

    and the pattern

    (ab|a)/ab
    

    it makes a certain amount of sense for the (ab|a) to match greedily, and then for the /ab constraint to be applied separately. You’re thinking that it should work like this regular expression:

    (ab|a)(ab)
    

    with the constraint that the part matched by (ab) is not consumed. That’s probably better because it removes some limitations, but since there weren’t any external requirements for what lex should do at the time it was written, you cannot call either behavior correct or incorrect.

    The naive way has the merit that adding a trailing context doesn’t change the meaning of a token, but simply adds a totally separate constraint about what may follow it. But that does lead to limitations/surprises:

     {IDENT}  /* original code */
    
     {IDENT}/ab   /* ident, only when followed by ab */
    

    Oops, it won’t work because “ab” is swallowed into IDENT precisely because its meaning was not changed by the trailing context. That turns into a limitation, but maybe it’s a limitation that the author was willing to live with in exchange for simplicity. (What is the use case for making it more contextual, anyway?)

    How about the other way? That could have surprises also:

     {IDENT}/ab  /* input is bracadabra:123 */
    

    Say the user wants this not to match because bracadabra is not an identifier followed by (or ending in) ab. But {IDENT}/ab will match bracad and then, leaving abra:123 in the input.

    A user could have expectations which are foiled no matter how you pin down the semantics.

    lex is now standardized by The Single Unix specification, which says this:

    r/x
    The regular expression r shall be matched only if it is followed by an occurrence of regular expression x ( x is the instance of trailing context, further defined below). The token returned in yytext shall only match r. If the trailing portion of r matches the beginning of x, the result is unspecified. The r expression cannot include further trailing context or the ‘$’ (match-end-of-line) operator; x cannot include the ‘^’ (match-beginning-of-line) operator, nor trailing context, nor the ‘$’ operator. That is, only one occurrence of trailing context is allowed in a lex regular expression, and the ‘^’ operator only can be used at the beginning of such an expression.

    So you can see that there is room for interpretation here. The r and x can be treated as separate regexes, with a match for r computed in the normal way as if it were alone, and then x applied as a special constraint.

    The spec also has discussion about this very issue (you are in luck):

    The following examples clarify the differences between lex regular expressions and regular expressions appearing elsewhere in this volume of IEEE Std 1003.1-2001. For regular expressions of the form “r/x”, the string matching r is always returned; confusion may arise when the beginning of x matches the trailing portion of r. For example, given the regular expression “a*b/cc” and the input “aaabcc”, yytext would contain the string “aaab” on this match. But given the regular expression “x*/xy” and the input “xxxy”, the token xxx, not xx, is returned by some implementations because xxx matches “x*”.

    In the rule “ab*/bc”, the “b*” at the end of r extends r’s match into the beginning of the trailing context, so the result is unspecified. If this rule were “ab/bc”, however, the rule matches the text “ab” when it is followed by the text “bc”. In this latter case, the matching of r cannot extend into the beginning of x, so the result is specified.
    As you can see there are some limitations in this feature.

    Unspecified behavior means that there are some choices about what the behavior should be, none of which are more correct than the others (and don’t write patterns like that if you want your lex program to be portable). “As you can see, there are some limitations in this feature”.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm currently working my way through Andrew Appel's Modern Compiler Implementation in Java, and
Modern browsers have multi-tab interface, but JavaScript function window.showModalDialog() creates a modal dialog that
In several modern programming languages (including C++, Java, and C#), the language allows integer
I notice that modern C and C++ code seems to use size_t instead of
Now that most modern browsers support AJAX and client-side requests without performing a POST,
In modern compiler toolchains, how are dynamically linked libraries implemented? Do they have any
I'm trying to override Ant compiler attributes via the command line so that all
I've read that there is some compiler optimization when using #pragma once which can
The java code I'm working on at the moment has often a structure like
I know that the topic of C++ delegates has been done to death, and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.