Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 835621
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T04:51:10+00:00 2026-05-15T04:51:10+00:00

There are some features in modern regex engines which allow you to match languages

  • 0

There are some features in modern regex engines which allow you to match languages that couldn’t be matched without that feature. For example the following regex using back references matches the language of all strings that consist of a word that repeats itself: (.+)\1. This language is not regular and can’t be matched by a regex that does not use back references.

Does lookaround also affect which languages can be matched by a regular expression? I.e. are there any languages that can be matched using lookaround that couldn’t be matched otherwise? If so, is this true for all flavors of lookaround (negative or positive lookahead or lookbehind) or just for some of them?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T04:51:11+00:00Added an answer on May 15, 2026 at 4:51 am

    As the other answers claim, lookarounds don’t add any extra power to regular expressions.

    I think we can show this using the following:

    One Pebble 2-NFA (see the Introduction section of the paper which refers to it).

    The 1-pebble 2NFA does not deal with nested lookaheads, but, we can use a variant of multi-pebble 2NFAs (see section below).

    Introduction

    A 2-NFA is a non deterministic finite automaton which has the ability to move either left or right on it’s input.

    A one pebble machine is where the machine can place a pebble on the input tape (i.e. mark a specific input symbol with a pebble) and do possibly different transitions based on whether there is a pebble at the current input position or not.

    It is known the One Pebble 2-NFA has the same power as a regular DFA.

    Non-nested Lookaheads

    The basic idea is as follows:

    The 2NFA allows us to backtrack (or ‘front track’) by moving forward or backward in the input tape. So for a lookahead we can do the match for the lookahead regular expression and then backtrack what we have consumed, in matching the lookahead expression. In order to know exactly when to stop backtracking, we use the pebble! We drop the pebble before we enter the dfa for the lookahead to mark the spot where the backtracking needs to stop.

    Thus at the end of running our string through the pebble 2NFA, we know whether we matched the lookahead expression or not and the input left (i.e. what is left to be consumed) is exactly what is required to match the remaining.

    So for a lookahead of the form u(?=v)w

    We have the DFAs for u, v and w.

    From the accepting state (yes, we can assume there is only one) of DFA for u, we make an e-transition to the start state of v, marking the input with a pebble.

    From an accepting state for v, we e-transtion to a state which keeps moving the input left, till it finds a pebble, and then transitions to start state of w.

    From a rejecting state of v, we e-transition to a state which keeps moving left until it finds the pebble, and transtions to the accepting state of u (i.e where we left off).

    The proof used for regular NFAs to show r1 | r2, or r* etc, carry over for these one pebble 2nfas. See http://www.coli.uni-saarland.de/projects/milca/courses/coal/html/node41.html#regularlanguages.sec.regexptofsa for more info on how the component machines are put together to give the bigger machine for the r* expression etc.

    The reason why the above proofs for r* etc work is that the backtracking ensures that the input pointer is always at the right spot, when we enter the component nfas for repetition. Also, if a pebble is in use, then it is being processed by one of the lookahead component machines. Since there are no transitions from lookahead machine to lookahead machine without completely backtracking and getting back the pebble, a one pebble machine is all that is needed.

    For eg consider ([^a] | a(?=…b))*

    and the string abbb.

    We have abbb which goes through the peb2nfa for a(?=…b), at the end of which we are at the state: (bbb, matched) (i.e in input bbb is remaining, and it has matched ‘a’ followed by ‘..b’). Now because of the *, we go back to the beginning (see the construction in the link above), and enter the dfa for [^a]. Match b, go back to beginning, enter [^a] again two times, and then accept.

    Dealing with Nested Lookaheads

    To handle nested lookaheads we can use a restricted version of k-pebble 2NFA as defined here: Complexity Results for Two-Way and Multi-Pebble Automata and their Logics (see Definition 4.1 and Theorem 4.2).

    In general, 2 pebble automata can accept non-regular sets, but with the following restrictions, k-pebble automata can be shown to be regular (Theorem 4.2 in above paper).

    If the pebbles are P_1, P_2, …, P_K

    • P_{i+1} may not be placed unless P_i is already on the tape and P_{i} may not be picked up unless P_{i+1} is not on the tape. Basically the pebbles need to be used in a LIFO fashion.

    • Between the time P_{i+1} is placed and the time that either P_{i} is picked up or P_{i+2} is placed, the automaton can traverse only the subword located between the current location of P_{i} and the end of the input word that lies in the direction of P_{i+1}. Moreover, in this sub-word, the automaton can act only as a 1-pebble automaton with Pebble P_{i+1}. In particular it is not allowed to lift up, place or even sense the presence of another pebble.

    So if v is a nested lookahead expression of depth k, then (?=v) is a nested lookahead expression of depth k+1. When we enter a lookahead machine within, we know exactly how many pebbles have to have been placed so far and so can exactly determine which pebble to place and when we exit that machine, we know which pebble to lift. All machines at depth t are entered by placing pebble t and exited (i.e. we return to processing of a depth t-1 machine) by removing pebble t. Any run of the complete machine looks like a recursive dfs call of a tree and the above two restrictions of the multi-pebble machine can be catered to.

    Now when you combine expressions, for rr1, since you concat, the pebble numbers of r1 must be incremented by the depth of r. For r* and r|r1 the pebble numbering remains the same.

    Thus any expression with lookaheads can be converted to an equivalent multi-pebble machine with the above restrictions in pebble placement and so is regular.

    Conclusion

    This basically addresses the drawback in Francis’s original proof: being able to prevent the lookahead expressions from consuming anything which are required for future matches.

    Since Lookbehinds are just finite string (not really regexs) we can deal with them first, and then deal with the lookaheads.

    Sorry for the incomplete writeup, but a complete proof would involve drawing a lot of figures.

    It looks right to me, but I will be glad to know of any mistakes (which I seem to be fond of :-)).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

There is a common feature of modern browsers where a user can select some
After messing about with F# there are some really nice features that I think
We are in the process of refactoring some code. There is a feature that
There are some features within my application that are supposed to be based on
We want to implement a sitemap.xml feature in out CMS system. There is some
(I'm developing in Python 3.1, so if there's some shiny new 3.x feature I
Is there any Spring 3 feature to execute some methods when the application starts
Is there some way when sending this message to specify that I rather have
Problem: My app must run on iPhone OS 3.0. However, there are some features
Is there some way to redirect a user that reaches gets the AccessDenied-error on

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.