Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9081999
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T20:27:04+00:00 2026-06-16T20:27:04+00:00

I’m trying create grammar for SRT format: Here is an example of srt file:

  • 0

I’m trying create grammar for SRT format:

Here is an example of srt file:

1
00:00:02,218 --> 00:00:04,209
[SHELDON SPEAKING IN MANDARIN]

2
00:00:04,721 --> 00:00:05,745
No, it's:

3
00:00:05,922 --> 00:00:07,913
[SPEAKING IN MANDARIN]

4
00:00:09,392 --> 00:00:11,383
[SPEAKING IN MANDARIN]

5
00:00:13,430 --> 00:00:15,193
What's this?

6
00:00:16,266 --> 00:00:18,029
That's what you did.

7
00:00:18,201 --> 00:00:22,467
I assumed, as in a number of languages,
that the gesture was part of the phrase.

8
00:00:22,639 --> 00:00:25,233
- Well, it's not.
- Why am I supposed to know that?

9
00:00:25,408 --> 00:00:28,900
As teacher, it's your obligation
to separate your personal idiosyncrasies...

10
00:00:29,079 --> 00:00:30,512
...from the subject matter.

11
00:00:31,081 --> 00:00:33,845
- I'm glad you decided to learn Mandarin.
- Why?

326
00:18:56,818 --> 00:19:00,720
Actually, I've heard
far too much about Schrödinger's cat.

327
00:19:01,623 --> 00:19:03,022
Good.

328
00:19:09,131 --> 00:19:11,895
All right, the cat's alive.
Let's go to dinner.

329
00:19:12,000 --> 00:19:15,072
Download Movie Subtitles Searcher from www.OpenSubtitles.org

Here is my grammar for antlr (v. 3.4).

grammar Exp;


parse
    :  (SUBTITLE)+
    ;

SUBTITLE
    : i=ID NL 
      t1=Timestamp SPACE ARROW SPACE t2=Timestamp NL 
      txt1 = TEXT

        {
            System.out.println("id="+$i); 
            System.out.println("t1= "+$t1); 
            System.out.println("t2= "+$t2);
            System.out.println("txt1= "+$txt1);

        }
    ;

TEXT 
    : ((TextLine NL NL)|(TextLine NL TextLine NL NL))
    ;

ID
    : DIG+
    ;

ARROW
    : '-->'
    ;

Timestamp
    : DIG DIG ':' DIG DIG ':' DIG DIG ',' DIG DIG DIG
    ;

TextLine
  :  ~('\r' | '\n')*
  ;

NL
  :  '\r'? '\n'
  |  '\r'
  ;

fragment
DIG 
    : '0'..'9'
    ;

fragment
SPACE
    :   ' ' | '\t'
    ;

My simple code:

String input = IOUtils.toString(Test.class.getResourceAsStream("/subtitles.srt"));
ExpLexer lexer = new ExpLexer(new ANTLRStringStream(input));
CommonTokenStream stream = new CommonTokenStream(lexer);
ExpParser parser = new ExpParser(stream);
parser.parse();

And almost everything works perfectly if at the end of file I have two new lines. If not I got this error:

line 1484:0 no viable alternative at character '<EOF>'

Any advice how to change my grammar to be more flexible ? Accept that at the end will be one new line, two new lines or more.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T20:27:05+00:00Added an answer on June 16, 2026 at 8:27 pm

    You’re using way too much lexer rules.

    Try something like this:

    grammar T;
    
    options {
      output=AST;
    }
    
    tokens {
      BLOCKS;
      BLOCK;
      TIME_RANGE;
      LINES;
      LINE;
      WORD;
    }
    
    parse
     : LineBreak* blocks LineBreak* EOF -> blocks
     ;
    
    blocks
     : block (LineBreak LineBreak+ block)* -> ^(BLOCKS block+)
     ;
    
    block 
     : Number Spaces? LineBreak time_range LineBreak text_lines -> ^(BLOCK Number time_range text_lines)
     ;
    
    time_range
     : Time Spaces? Arrow Spaces? Time Spaces? -> ^(TIME_RANGE Time Time)
     ;
    
    text_lines
     : line (LineBreak line)* -> ^(LINES line+)
     ;
    
    line
     : Spaces? word (Spaces word)* Spaces? -> ^(LINE word+)
     ;
    
    word
     : (Other | Number | Dashes | Arrow)+ -> WORD[$text]
     ;
    
    Time      : Number ':' Number ':' Number ',' Number;
    Arrow     : '-->';
    Dashes    : '-'+;
    Number    : '0'..'9'+;
    LineBreak : '\r'? '\n' | '\r';
    Spaces    : (' ' | '\t')+;
    Other     : . ;
    

    which will parse the input:

    1
    00:00:02,218 --> 00:00:04,209
    [A B C]
    
    2
    00:00:04,721 --> 00:00:05,745
    -- Line 1
    -- Line 2
    
    
    
    3
    00:00:05,922 --> 00:00:07,913
    mu --> MU

    into the following AST:

    enter image description here

    (click the image for a larger version)

    EDIT

    I have some problem when in text is number and colon. ‘Season 1 Episode 15:’ or ‘ “I’ll call you at 11:00. Victoria.” ‘ Trying to modify your example but no success.

    Untested, but I think this should work: just make everything after the first colon in the Time rule optional. And at the end of the rule, check if the last Number in Time is matched or not. If not, change the type of the token to Other.

    Time      
     : Number ':' (Number (':' (Number (',' last=Number?)?)?)?)?
       {
         if($last.text == null) $type = Other;
       }
     ;
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
Basically, what I'm trying to create is a page of div tags, each has
I am trying to find ID3V2 tags from MP3 file using jid3lib in Java.
I am trying to render a haml file in a javascript response like so:
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I'm trying to create an if statement in PHP that prevents a single post
I am trying to understand how to use SyndicationItem to display feed which is
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have just tried to save a simple *.rtf file with some websites and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.