Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1082349
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T22:17:06+00:00 2026-05-16T22:17:06+00:00

After reading polygenelubricants ‘s series of articles on advanced regular expressions techniques (particularly How

  • 0

After reading polygenelubricants‘s series of articles on advanced regular expressions techniques (particularly How does this Java regex detect palindromes?), I decided to attempt to create my own PCRE regex to parse a palindrome, using recursion (in PHP).

What I came up with was:

^(([a-z])(?1)\2|[a-z]?)$

My understanding of this expression is that it should either match zero or one characters (every string of less than 2 characters is implicitly a palindrome, as well as to account for palindromes with odd lengths in the recursion), or two of the same character separated by a recursion of the pattern.

Unfortunately, it does not seem to be working that way, as you can see at http://www.ideone.com/a9T3F. Instead, only the strings of 2n – 1 (ie. empty string, a, aaa, aaaaaaa, a15) repeated characters match the regular expression.

Oddly, if I modify my pattern so that the recursion is optional (ie. ^(([a-z])(?1)?\2|[a-z]?)$, see http://www.ideone.com/D6lJR, it only matches strings with a character repeated 2n times (ie. empty string, a, aa, aaaa, aaaaaaaa, a16).

Why is my regex not working the way I expect it to?

Note for the people who are itching to suggest not to use regex:
The point of this question is to learn how to use recursive regular expressions properly. I know that this is not an effective way to determine if a string is a palindrome, and I wouldn’t use a recursive regex if I for some reason had to determine palindromes in production code; I am just interested in learning more about the advanced aspects of regex.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T22:17:07+00:00Added an answer on May 16, 2026 at 10:17 pm

    The phenomenon you’re observing is due to the fact that PCRE subpattern recursion is atomic, unlike Perl. The man page actually covers this problem in great detail:

    In PCRE (like Python, but unlike Perl), a recursive subpattern call is
    always treated as an atomic group. That is, once it has matched some of
    the subject string, it is never re-entered, even if it contains untried
    alternatives and there is a subsequent matching failure
    .

    This can be illustrated by the following pattern, which purports to match a palindromic string that contains an odd number of characters (for example,
    "a", "aba", "abcba", "abcdcba"):

        ^(.|(.)(?1)\2)$
    

    The idea is that it either matches a single character, or two identical
    characters surrounding a sub-palindrome. In Perl, this pattern works;
    in PCRE it does not if the pattern is longer than three characters
    .

    Consider the subject string "abcba":

    At the top level, the first character is matched, but as it is not at
    the end of the string, the first alternative fails; the second alternative
    is taken and the recursion kicks in. The recursive call to subpattern 1
    successfully matches the next character ("b"). (Note that the beginning
    and end of line tests are not part of the recursion).

    Back at the top level, the next character ("c") is compared with what
    subpattern 2 matched, which was "a". This fails. Because the recursion
    is treated as an atomic group, there are now no backtracking points,
    and so the entire match fails. (Perl is able, at this point, to re-
    enter the recursion and try the second alternative.) However, if the
    pattern is written with the alternatives in the other order, things are
    different:

        ^((.)(?1)\2|.)$
    

    This time, the recursing alternative is tried first, and continues to
    recurse until it runs out of characters, at which point the recursion
    fails. But this time we do have another alternative to try at the
    higher level. That is the big difference: in the previous case the
    remaining alternative is at a deeper recursion level, which PCRE cannot use.

    To change the pattern so that matches all palindromic strings, not just
    those with an odd number of characters, it is tempting to change the
    pattern to this:

        ^((.)(?1)\2|.?)$
    

    Again, this works in Perl, but not in PCRE, and for the same reason.
    When a deeper recursion has matched a single character, it cannot be
    entered again in order to match an empty string. The solution is to
    separate the two cases, and write out the odd and even cases as alternatives
    at the higher level:

        ^(?:((.)(?1)\2|)|((.)(?3)\4|.))$
    

    WARNING!!!

    The palindrome-matching patterns above work only if the subject string does not start with a palindrome that is shorter than the
    entire string. For example, although "abcba" is correctly matched, if
    the subject is "ababa", PCRE finds the palindrome "aba" at the start,
    then fails at top level because the end of the string does not follow.
    Once again, it cannot jump back into the recursion to try other alternatives,
    so the entire match fails.

    Additional references

    • regular-expressions.info/Atomic grouping
      • (?>…) in some flavor is atomic grouping syntax
      • Lookarounds (?=…), (?!…), (?<=…), (?<!…), are all atomic
      • Possessive quantifier (e.g. a*+) is also atomic
      • PCRE recursive subpattern and subroutine calls are also atomic

    A closer look at the pattern

    The atomicity argument is correct, but perhaps it’s not obvious how it explains why the pattern behaves as observed. Let’s take a closer look and see how this all fits:

    We will use the first pattern:

    ^(([a-z])(?1)\2|[a-z]?)$
    

    I will use the following notation to denote the recursion:

    • 1 means the character was captured into group 2 in the first alternate
    • 2 means the character was matched by the second alternate
      • Or if the 2 is not above a character, the zero repetition option of ? is exercised
    • \ means the character was matched by the backreference to group 2 in first alternate
    • _ denotes the bottom of a recursive branch
      • This branch will NOT be reentered even if there are other alternatives!

    Now let’s consider "aaa" as input:

          _
    1 1 1 2 
    a a a   # This is the first bottom of the recursion,
            # now we go back to the third 1 and try to match \.
            # This fails, so the third 1 becomes 2.
        _
    1 1 2
    a a a   # Now we go back to the second 1 and try to match \.
            # This fails, so the second 1 becomes 2.
      _
    1 2
    a a a   # The second level matched! now we go back to the first level...
    
    _____
    1 2 \
    a a a   # Now the first 1 can match \, and entire pattern matches!!
    

    Now consider "aaaaa":

              _
    1 1 1 1 1 2
    a a a a a  # Fifth 1 can't match \, so it becomes 2. 
            _
    1 1 1 1 2
    a a a a a  # Fourth 1 can't match \, so it becomes 2.
        _____
    1 1 1 2 /
    a a a a a  # Here's a crucial point. The third 1 successfully matched.
               # Now we're back to the second 1 and try to match \, but this fails.
               # However, since PCRE recursion is atomic, the third 1 will NOT be
               # reentered to try 2. Instead, we try 2 on the second 1.
    _____
    1 2 \
    a a a a a  # Anchors don't match, so the first 1 becomes 2, and then also the
               # anchors don't match, so the pattern fails to match.
    

    Note that once a recursion level matches on the first alternative, the second alternative will not be attempted in the future (even if doing so may result in a may match), because PCRE subpattern recursion is atomic.


    Now consider "aa":

        _
    1 1 2 
    a a
      _
    1 2
    a a  # The second level matched by taking the one repetition option on ?.
         # We now go back to the first level, and we can't match \.
         # Since PCRE recursion is atomic, we can't go back to the second level
         # to try the zero repetition option on ?.
    _    
    2
    a a  # Anchors don't match, trying zero option on ? also doesn't help,
         # so the pattern fails to match!
    

    Note that once a recursion level matches on the one repetition of the ? on the second alternative, the zero repetition option will not be attempted in the future (even if doing so may result in a may match), because PCRE subpattern recursion is atomic.


    Now let’s consider aaaaaaa

                  _
    1 1 1 1 1 1 1 2  
    a a a a a a a 
                _
    1 1 1 1 1 1 2  
    a a a a a a a 
            _____
    1 1 1 1 1 2 \  
    a a a a a a a  # A crucial point: the fifth level matched and now the fourth
                   # level can't match \, but it does NOT reenter the fifth level to
                   # try 2. Instead, the fourth level tries 2.
        _____    
    1 1 1 2 \  
    a a a a a a a 
      _________    
    1 1 1 2 \ \ 
    a a a a a a a 
    _____________    
    1 1 1 2 \ \ \
    a a a a a a a  # Entire pattern is a match! 
    

    Note that even though PCRE subpattern recursion is atomic, it can still successfully match a palindrome consisting of a character repeating 2n-1 times.


    Now, just for fun, let’s try "abcba":

              _
    1 1 1 1 1 2
    a b c b a
            _
    1 1 1 1 2
    a b c b a
    
    1 1 1 2 
    a b c b a   # Third level attempts \, but c does not match a!
                # So we go back to third 1 and try 2.
      _____
    1 1 2 \ 
    a b c b a 
    _________
    1 1 2 \ \
    a b c b a   # Entire pattern is a match!
    

    That is, the pattern doesn’t just match “only when a character repeats 2n-1 times”. It can indeed match "abcba" (as seen on ideone.com). It can NOT, however, match "ababa", nor can it match "aaaaa" (see the WARNING on the man page!), because subpattern recursion in PCRE is atomic.

    You can apply this same tracing process to explain the behavior of the pattern on any input.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

After reading this question , I was reminded of when I was taught Java
After reading monkeytalk faq from http://www.gorillalogic.com/testing-tools/monkeytalk/documentation/monkeytalk-faq : How does it all work? MonkeyTalk is
After reading answer to this question: Make "make" default to "make -j 8" I
After reading the Bash man pages and with respect to this post , I
After reading this question, I need to clear up some things. IQueryable<Customer> custs =
After reading several articles of a hierarchical data tables, I've made my way into
After reading this article, it makes sense to rebase to gather changes from the
After reading the android documentation about String, which includes this: This class is implemented
After reading this post I was wondering what would be the best way to
After reading similar questions with many variants, I'm still stuck on this issue. Here's

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.