Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8171029
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T21:19:48+00:00 2026-06-06T21:19:48+00:00

I am writing lexer rules for a custom description language using pyLR1 which shall

  • 0

I am writing lexer rules for a custom description language using pyLR1 which shall include time literals like for example:

10h30m     # meaning 10 hours + 30 minutes
5m30s      # meaning 5 minutes + 30 seconds
10h20m15s  # meaning 10 hours + 20 minutes + 15 seconds
15.6s      # meaning 15.6 seconds

The order of specification for hour, minute and second parts shall be fixed to h, m, s. To specify this in detail, I want the following valid combinations hms, hm, h, ms, m and s (with numbers between the different segments of course).
As a bonus the regex should check for decimal (i.e. non-natural) numbers in the segments and only allow these in the segment with least significance.

So I have for all but the last group a number match like:

([0-9]+)

And for the last group even:

([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)  # to allow for .5 and 0.5 and 5.0 and 5

Going through all the combinations of h, m and s a cute little python script gives me the following regex:

(([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)h|([0-9]+)h([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)m|([0-9]+)h([0-9]+)m([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s|([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)m|([0-9]+)m([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s|([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s) 

Obviously, this is a little bit of horror expression. Is there any way to simplify this? The answer must work with pythons re module and I will also accept answers which do not work with pyLR1 if its due to its restricted subset of regular expressions.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T21:19:49+00:00Added an answer on June 6, 2026 at 9:19 pm

    You can factorise your regular expression, using the notation h, m, s to denote each of the subregexes, the most basic version is:

    h|hm|hms|ms|m|s
    

    which is what you have currently. You can break this into:

    (h|hm|hms)|(ms|m)|s
    

    and then pulling out h from the first expression and m from the second we get (using (x|) == x?):

    h(m|ms)?|ms?|s
    

    Continuing on we get to

    h(ms?)?|ms?|s
    

    which is probably simpler (and probably the simplest).


    Adding in the regex d to denote decimals (as in \.[0-9]+), this could be written as

    h(d|m(d|sd?)?)?|m(d|sd?)?|sd?
    

    (i.e. at each stage optionally have either decimals, or a continuation to the next of h m or s.)

    This would result in something like (for just hours and minutes):

    [0-9]+((\.[0-9]+)?h|h[0-9]+(\.[0-9]+)?m)|[0-9]+(\.[0-9]+)?m
    

    Looking at this, it might not be possible to get into a form ameniable for pyLR1, so doing the parsing with decimals in every spot and then a secondary check might be the best way to do this.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm writing a little language into which I will incorporate XML literals in this
I'm writing a lexer using jflex for a made up programming language. It requires
I'm writing a custom flex file to generate a lexer for use with JSyntaxpane.
I'm writing a lexer in Prolog which will be used as a part of
//Writing a letter #include <iostream> using namespace std; int main() { string first_name; //Name
I am writing a lexer for Haskell using JavaScript and Parsing Expression Grammar, the
Good afternoon, I am writing a simple lexer which is basically a modified version
I am writing a parser for delphi's dfm's files. The lexer looks like this:
Writing something like this using the loki library , typedef Functor<void> BitButtonPushHandler; throws a
I'm writing a DSL and using a Boost Spirit lexer to tokenize my input.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.