Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5977735
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T21:24:48+00:00 2026-05-22T21:24:48+00:00

First question stream Hello everyone, This could be a follow-up on this question: Antlr

  • 0

First question stream

Hello everyone,

This could be a follow-up on this question: Antlr rule priorities

I’m trying to write an ANTLR grammar for the reStructuredText markup language.

The main problem I’m facing is : “How to match any sequence of characters (regular text) without masking other grammar rules?”

Let’s take an example with a paragraph with inline markup:

In `Figure 17-6`_, we have positioned ``before_ptr`` so that it points to the element 
*before* the insert point. The variable ``after_ptr`` points to the element *after* the 
insert. In other words, we are going to put our new element **in between** ``before_ptr`` 
and ``after_ptr``.

I thought that writing rules for inline markup text would be easy. So I wrote a simple grammar:

grammar Rst;

options {
    output=AST;
    language=Java;
    backtrack=true;
    //memoize=true;
}

@members {
boolean inInlineMarkup = false;
}

// PARSER

text
    : inline_markup (WS? inline_markup)* WS? EOF
    ;


inline_markup
@after {
inInlineMarkup = false;
}
    : {!inInlineMarkup}? (emphasis|strong|litteral|link)
    ;

emphasis
@init {
inInlineMarkup = true;
}
    : '*' (~'*')+ '*' {System.out.println("emphasis: " + $text);}
    ;

strong
@init {
inInlineMarkup = true;
}
    : '**' (~'*')+ '**' {System.out.println("bold: " + $text);}
    ;

litteral
@init {
inInlineMarkup = true;
}
    : '``' (~'`')+ '``' {System.out.println("litteral: " + $text);}
    ;

link
@init {
inInlineMarkup = true;
}
    : inline_internal_target
    | footnote_reference
    | hyperlink_reference
    ;

inline_internal_target
    : '_`' (~'`')+ '`' {System.out.println("inline_internal_target: " + $text);}
    ;

footnote_reference
    : '[' (~']')+ ']_' {System.out.println("footnote_reference: " + $text);}
    ;


hyperlink_reference
    : ~(' '|'\t'|'\u000C'|'_')+ '_' {System.out.println("hyperlink_reference: " + $text);}
    |   '`' (~'`')+ '`_' {System.out.println("hyperlink_reference (long): " + $text);}
    ;

// LEXER

WS  
  : (' '|'\t'|'\u000C')+
  ; 

NEWLINE
  : '\r'? '\n'
  ;

This simple grammar doesn’t work. And I didn’t even try to match regular text…

My questions:

  • Could someone point to my errors and maybe give me a hint on how to match regular text?
  • Is there a way to set priority on the grammar rules? Maybe this could be a lead.

Thanks in advance for your help 🙂

Robin


Second question stream

Thank you very much for your help! I would have had a hard time figuring my errors… I’m not writing that grammar (only) to learn ANTLR, I’m trying to code an IDE plugin for eclipse. And for that, I need a grammar 😉

I managed to go further in the grammar and wrote a text rule:

grammar Rst;

options {
    output=AST;
    language=Java;
}



@members {
boolean inInlineMarkup = false;
}

//////////////////
// PARSER RULES //
//////////////////

file
  : line* EOF
  ;


line
  : text* NEWLINE
  ;

text
    : inline_markup
    | normal_text
    ;

inline_markup
@after {
inInlineMarkup = false;
}
    : {!inInlineMarkup}? {inInlineMarkup = true;} 
  (
  | STRONG
  | EMPHASIS
  | LITTERAL
  | INTERPRETED_TEXT
  | SUBSTITUTION_REFERENCE
  | link
  )
    ;


link
    : INLINE_INTERNAL_TARGET
    | FOOTNOTE_REFERENCE
    | HYPERLINK_REFERENCE
    ;

normal_text
  : {!inInlineMarkup}? 
   ~(EMPHASIS
      |SUBSTITUTION_REFERENCE
      |STRONG
      |LITTERAL
      |INTERPRETED_TEXT
      |INLINE_INTERNAL_TARGET
      |FOOTNOTE_REFERENCE
      |HYPERLINK_REFERENCE
      |NEWLINE
      )
  ;
//////////////////
// LEXER TOKENS //
//////////////////

EMPHASIS
    : STAR ANY_BUT_STAR+ STAR {System.out.println("EMPHASIS: " + $text);}
    ;

SUBSTITUTION_REFERENCE
  : PIPE ANY_BUT_PIPE+ PIPE  {System.out.println("SUBST_REF: " + $text);}
  ;

STRONG
    : STAR STAR ANY_BUT_STAR+ STAR STAR {System.out.println("STRONG: " + $text);}
    ;

LITTERAL
    : BACKTICK BACKTICK ANY_BUT_BACKTICK+ BACKTICK BACKTICK {System.out.println("LITTERAL: " + $text);}
    ;
INTERPRETED_TEXT
  : BACKTICK ANY_BUT_BACKTICK+ BACKTICK {System.out.println("LITTERAL: " + $text);}
  ;

INLINE_INTERNAL_TARGET
    : UNDERSCORE BACKTICK ANY_BUT_BACKTICK+ BACKTICK {System.out.println("INLINE_INTERNAL_TARGET: " + $text);}
    ;

FOOTNOTE_REFERENCE
    : L_BRACKET ANY_BUT_BRACKET+ R_BRACKET UNDERSCORE {System.out.println("FOOTNOTE_REFERENCE: " + $text);}
    ;


HYPERLINK_REFERENCE
  : BACKTICK ANY_BUT_BACKTICK+ BACKTICK UNDERSCORE {System.out.println("HYPERLINK_REFERENCE (long): " + $text);}
  | ANY_BUT_ENDLINK+ UNDERSCORE {System.out.println("HYPERLINK_REFERENCE (short): " + $text);}
  ;

WS  
  : (' '|'\t')+ {$channel=HIDDEN;}
  ; 

NEWLINE
  : '\r'? '\n' {$channel=HIDDEN;}
  ;


///////////////
// FRAGMENTS //
///////////////

fragment ANY_BUT_PIPE
  : ESC PIPE
  | ~(PIPE|'\n'|'\r')
  ;
fragment ANY_BUT_BRACKET
  : ESC R_BRACKET
  | ~(R_BRACKET|'\n'|'\r')
  ;
fragment ANY_BUT_STAR
  : ESC STAR
  | ~(STAR|'\n'|'\r')
  ;
fragment ANY_BUT_BACKTICK
  : ESC BACKTICK
  | ~(BACKTICK|'\n'|'\r')
  ;
fragment ANY_BUT_ENDLINK
  : ~(UNDERSCORE|' '|'\t'|'\n'|'\r')
  ;



fragment ESC
  : '\\'
  ;
fragment STAR
  : '*'
  ;
fragment BACKTICK
  : '`'
  ;
fragment PIPE
  : '|'
  ;
fragment L_BRACKET
  : '['
  ;
fragment R_BRACKET
  : ']'
  ;
fragment UNDERSCORE
  : '_'
  ;

The grammar is working fine for inline_markup but normal_text is not matched.

Here is my test class:

import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.tree.Tree;

public class Test {

    public static void main(String[] args) throws RecognitionException, IOException {

        InputStream is = Test.class.getResourceAsStream("test.rst");
        Reader r = new InputStreamReader(is);
        StringBuilder source = new StringBuilder();
        char[] buffer = new char[1024];
        int readLenght = 0;
        while ((readLenght = r.read(buffer)) > 0) {
            if (readLenght < buffer.length) {
                source.append(buffer, 0, readLenght);
            } else {
                source.append(buffer);
            }
        }
        r.close();
        System.out.println(source.toString());

        ANTLRStringStream in = new ANTLRStringStream(source.toString());
        RstLexer lexer = new RstLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        RstParser parser = new RstParser(tokens);
        RstParser.file_return out = parser.file();
        System.out.println(((Tree)out.getTree()).toStringTree());
    }
}

And the input file I use:

In `Figure 17-6`_, we have positioned ``before_ptr`` so that it points to the element 
*before* the insert point. The variable ``after_ptr`` points to the |element| *after* the 
insert. In other words, `we are going`_ to put_ our new element **in between** ``before_ptr`` 
and ``after_ptr``.

And I get this output:

HYPERLINK_REFERENCE (short): 7-6`_
line 1:2 mismatched character ' ' expecting '_'
line 1:10 mismatched character ' ' expecting '_'
line 1:18 mismatched character ' ' expecting '_'
line 1:21 mismatched character ' ' expecting '_'
line 1:26 mismatched character ' ' expecting '_'
line 1:37 mismatched character ' ' expecting '_'
LITTERAL: `before_ptr`
line 1:86 no viable alternative at character '\r'
line 1:55 mismatched character ' ' expecting '_'
line 1:60 mismatched character ' ' expecting '_'
line 1:63 mismatched character ' ' expecting '_'
line 1:70 mismatched character ' ' expecting '_'
line 1:73 mismatched character ' ' expecting '_'
line 1:77 mismatched character ' ' expecting '_'
line 1:85 mismatched character ' ' expecting '_'
EMPHASIS: *before*
line 2:12 mismatched character ' ' expecting '_'
line 2:19 mismatched character ' ' expecting '_'
line 2:26 mismatched character ' ' expecting '_'
LITTERAL: `after_ptr`
line 2:30 mismatched character ' ' expecting '_'
line 2:39 mismatched character ' ' expecting '_'
line 2:90 no viable alternative at character '\r'
line 2:60 mismatched character ' ' expecting '_'
line 2:63 mismatched character ' ' expecting '_'
line 2:67 mismatched character ' ' expecting '_'
line 2:77 mismatched character ' ' expecting '_'
line 2:85 mismatched character ' ' expecting '_'
line 2:89 mismatched character ' ' expecting '_'
line 3:7 mismatched character ' ' expecting '_'
line 3:10 mismatched character ' ' expecting '_'
line 3:16 mismatched character ' ' expecting '_'
line 3:23 mismatched character ' ' expecting '_'
line 3:27 mismatched character ' ' expecting '_'
line 3:31 mismatched character ' ' expecting '_'
line 3:42 mismatched character ' ' expecting '_'
line 3:51 mismatched character ' ' expecting '_'
line 3:55 mismatched character ' ' expecting '_'
line 3:63 mismatched character ' ' expecting '_'
line 3:94 mismatched character '\r' expecting '*'
line 4:3 mismatched character ' ' expecting '_'
line 4:18 no viable alternative at character '\r'
line 4:18 mismatched character '\r' expecting '_'
HYPERLINK_REFERENCE (short): oing`_
HYPERLINK_REFERENCE (short): ut_
EMPHASIS: *in between*
LITTERAL: `after_ptr`
BR.recoverFromMismatchedToken
line 0:-1 mismatched input '<EOF>' expecting NEWLINE
null

Can you point to my error(s)? (the parser works for inline markup without errors when I add the filter=true; option to the grammar)

Robin

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T21:24:48+00:00Added an answer on May 22, 2026 at 9:24 pm

    Here’s a quick demo how you could parse this reStructeredText. Note that it just handles a minor set of all available markup-syntax, and by adding more to it, you will affect the existing parser/lexer rules: so there is much, much more work to be done!

    Demo

    grammar RST;
    
    options {
      output=AST;
      backtrack=true;
      memoize=true;
    }
    
    tokens {
      ROOT;
      PARAGRAPH;
      INDENTATION;
      LINE;
      WORD;
      BOLD;
      ITALIC;
      INTERPRETED_TEXT;
      INLINE_LITERAL;
      REFERENCE;
    }
    
    parse
      :  paragraph+ EOF -> ^(ROOT paragraph+)
      ;
    
    paragraph
      :  line+ -> ^(PARAGRAPH line+)
      |  Space* LineBreak -> /* omit line-breaks between paragraphs from AST */
      ;
    
    line
      :  indentation text+ LineBreak -> ^(LINE text+)
      ;
    
    indentation
      :  Space* -> ^(INDENTATION Space*)
      ;
    
    text
      :  styledText
      |  interpretedText
      |  inlineLiteral
      |  reference
      |  Space
      |  Star
      |  EscapeSequence
      |  Any
      ;
    
    styledText
      :  bold
      |  italic
      ;
    
    bold
      :  Star Star boldAtom+ Star Star -> ^(BOLD boldAtom+)
      ;  
    
    italic
      :  Star italicAtom+ Star -> ^(ITALIC italicAtom+)
      ;
    
    boldAtom
      :  ~(Star | LineBreak)
      |  italic
      ;
    
    italicAtom
      :  ~(Star | LineBreak)
      |  bold
      ;
    
    interpretedText
      :  BackTick interpretedTextAtoms BackTick -> ^(INTERPRETED_TEXT interpretedTextAtoms)
      ;
    
    interpretedTextAtoms
      :  ~BackTick+
      ;
    
    inlineLiteral
      :  BackTick BackTick inlineLiteralAtoms BackTick BackTick -> ^(INLINE_LITERAL inlineLiteralAtoms)
      ;
    
    inlineLiteralAtoms
      :  inlineLiteralAtom+
      ;
    
    inlineLiteralAtom
      :  ~BackTick
      |  BackTick ~BackTick
      ;
    
    reference
      :  Any+ UnderScore -> ^(REFERENCE Any+)
      ;
    
    UnderScore
      :  '_'
      ;
    
    BackTick
      :  '`'
      ;
    
    Star
      :  '*'
      ;
    
    Space
      :  ' ' 
      |  '\t'
      ;
    
    EscapeSequence
      :  '\\' ('\\' | '*')
      ;
    
    LineBreak
      :  '\r'? '\n'
      |  '\r'
      ;
    
    Any
      :  .
      ;
    

    When you generate a parser and lexer from the above, and let it parse the following input file:

    ***x*** **yyy** *zz* *
    a b c
    
    P2 ``*a*`b`` `q`
    Python_
    
    

    (note the trailing line break!)

    the parser will produce the following AST:

    enter image description here

    EDIT

    The graph can be created by running this class:

    import org.antlr.runtime.*;
    import org.antlr.runtime.tree.*;
    import org.antlr.stringtemplate.*;
    
    public class Main {
      public static void main(String[] args) throws Exception {
        String source =
            "***x*** **yyy** *zz* *\n" +
            "a b c\n" +
            "\n" +
            "P2 ``*a*`b`` `q`\n" +
            "Python_\n";
        RSTLexer lexer = new RSTLexer(new ANTLRStringStream(source));
        RSTParser parser = new RSTParser(new CommonTokenStream(lexer));
        CommonTree tree = (CommonTree)parser.parse().getTree();
        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);
      }
    }
    

    or if your source comes from a file, do:

    RSTLexer lexer = new RSTLexer(new ANTLRFileStream("test.rst"));
    

    or

    RSTLexer lexer = new RSTLexer(new ANTLRFileStream("test.rst", "???"));
    

    where "???" is the encoding of your file.

    The class above will print the AST as a DOT file to the console. You can use a DOT viewer to display the AST. In this case, I posted an image created by kgraphviewer. But there are many more viewers around. A nice online one is this one, which appears to be using kgraphviewer under “the hood”. Good luck!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

First question here so hello everyone. The requirement I'm working on is a small
Let me first apologize if this question could sound perhaps sort of amateurish for
First question on Stackoverflow (.Net 2.0): So I am trying to return an XML
first of all sorry by my bad english, Hello i am new in this
this is my first question here @stackoverflow. I'm writing a monitoring tool for some
I'm trying to open a stream to a file. First I need to save
This question changed a lot since it was first asked because I didn't understand
First of all, I love ASP.NET MVC. This question is not a criticism of
In answering this question , I suggested that the OP open a stream at
i'm trying to write an application to stream videos like SipDroid, but SipDroid only

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.