Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6608231
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T19:35:16+00:00 2026-05-25T19:35:16+00:00

Lexer DFA results in code too large error I’m trying to parse Java Server

  • 0

Lexer DFA results in “code too large” error

I’m trying to parse Java Server Pages using ANTLR 3.

Java has a limit of 64k for the byte code of a single method, and I keep running into a “code too large” error when compiling the Java source generated by ANTLR.

In some cases, I’ve been able to fix it by compromising my lexer. For example, JSP uses the XML “Name” token, which can include a wide variety of characters. I decided to accept only ASCII characters in my “Name” token, which drastically simplified some tests in the and lexer allowed it to compile.

However, I’ve gotten to the point where I can’t cut any more corners, but the DFA is still too complex.

What should I do about it?

Are there common mistakes that result in complex DFAs?

Is there a way to inhibit generation of the DFA, perhaps relying on semantic predicates or fixed lookahead to help with the prediction?

Writing this lexer by hand will be easy, but before I give up on ANTLR, I want to make sure I’m not overlooking something obvious.

Background

ANTLR 3 lexers use a DFA to decide how to tokenize input. In the generated DFA, there is a method called specialStateTransition(). This method contains a switch statement with a case for each state in the DFA. Within each case, there is a series of if statements, one for each transition from the state. The condition of each if statement tests an input character to see if it matches the transition.

These character-testing conditions can be very complex. They normally have the following form:

int ch = … ; /* "ch" is the next character in the input stream. */
switch(s) { /* "s" is the current state. */
  …
  case 13 :
    if ((('a' <= ch) && (ch <= 'z')) || (('A' <= ch) && (ch <= 'Z')) || … )
      s = 24; /* If the character matches, move to the next state. */
    else if …

A seemingly minor change to my lexer can result in dozens of comparisons for a single transition, several transitions for each state, and scores of states. I think that some of the states being considered are impossible to reach due to my semantic predicates, but it seems like semantic predicates are ignored by the DFA. (I could be misreading things though—this code is definitely not what I’d be able to write by hand!)

I found an ANTLR 2 grammar in the Jsp2x tool, but I’m not satisfied with its parse tree, and I want to refresh my ANTLR skills, so I thought I’d try writing my own. I am using ANTLRWorks, and I tried to generate graphs for the DFA, but there appear to be bugs in ANTLRWorks that prevent it.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T19:35:16+00:00Added an answer on May 25, 2026 at 7:35 pm

    Grammars that are very large (many different tokens) have that problem, unfortunately (SQL grammars suffer from this too).

    Sometimes this can be fixed by making certain lexer rules fragments opposed to “full” lexer rules that produce tokens and/or re-arranging the way characters are matched inside the rules, but by looking at the way you already tried yourself, I doubt there can gained much in your case. However, if you’re willing to post your lexer grammar here on SO, I, or someone else, might see something that could be changed.

    In general, this problem is fixed by splitting the lexer grammar into 2 or more separate lexer grammars and then importing those in one “master” grammar. In ANTLR terms, these are called composite grammars. See this ANTLR Wiki page about them: http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars

    EDIT

    As @Gunther rightfully mentioned in the comment beneath the OP, see the Q&A: Why my antlr lexer java class is "code too large"? where a small change (the removal of a certain predicate) caused this “code too large”-error to disappear.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm having a great time doing a lexer using flex. Problem is, my code
I am getting the following warning from my Java code: Lexer.java:591: warning: [unchecked] unchecked
I'm new to ANTLR and I've come up with this lexer rule to parse
I am trying to do a lexer for a subset of Java with JavaCC.
I'm writing a lexer in haskell. Here's the code: lexer :: String -> [Token]
this code is the base of lexer , and it does the basic operation
I'm currently looking for a lexer/parser that generates Scala code from a BNF grammar
I'm trying to build a simple lexer/parser with Alex/Happy in Haskell, and I would
So I am working on constructing a lexer/parser pair using parser combinators which leaves
In the following code you'll see a simple lexer that conforms to the following

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.