Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8988655
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T22:03:19+00:00 2026-06-15T22:03:19+00:00

Let’s say we have a string: abcbcdcde I want to identify all substrings that

  • 0

Let’s say we have a string: “abcbcdcde”

I want to identify all substrings that are repeated in this string using regex (i.e. no brute-force iterative loops).

For the above string, the result set would be: {“b”, “bc”, “c”, “cd”, “d”}

I must confess that my regex is far more rusty than it should be for someone with my experience. I tried using a backreference, but that’ll only match consecutive duplicates. I need to match all duplicates, consecutive or otherwise.

In other words, I want to match any character(s) that appears for the >= 2nd time. If a substring occurs 5 times, then I want to capture each of occurrences 2-5. Make sense?

This is my pathetic attempt thus far:

preg_match_all( '/(.+)(.*)\1+/', $string, $matches );  // Way off!

I tried playing with look-aheads but I’m just butchering it. I’m doing this in PHP (PCRE) but the problem is more or less language-agnostic. It’s a bit embarrassing that I’m finding myself stumped on this.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T22:03:20+00:00Added an answer on June 15, 2026 at 10:03 pm

    Your problem is recursi … you know what, forget about recursion! =p it wouldn’t really work well in PHP and the algorithm is pretty clear without it as well.

      function find_repeating_sequences($s)
      {
        $res = array();
        while ($s) {
            $i = 1; $pat = $s[0];
            while (false !== strpos($s, $pat, $i)) {
                $res[$pat] = 1;
                // expand pattern and try again
                $pat .= $s[$i++];
            }
            // move the string forward
            $s = substr($s, 1);
        }
        return array_keys($res);
      }
    

    Out of interest, I wrote Tim’s answer in PHP as well:

    function find_repeating_sequences_re($s)
    {
        $res = array();
        preg_match_all('/(?=(.+).*\1)/', $s, $matches);
        foreach ($matches[1] as $match) {
            $length = strlen($match);
            if ($length > 1) {
                for ($i = 0; $i < $length; ++$i) {
                    for ($j = $i; $j < $length; ++$j) {
                        $res[substr($match, $i, $j - $i + 1)] = 1;
                    }
                }
            } else {
                $res[$match] = 1;
            }
        }
        return array_keys($res);
    }
    

    I’ve let them fight it out in a small benchmark of 800 bytes of random data:

    $data = base64_encode(openssl_random_pseudo_bytes(600));
    

    Each code is run for 10 rounds and the execution time is measured. The results?

    Pure PHP      - 0.014s (10 runs)
    PCRE          - 40.86s <-- ouch!
    

    It gets weirder when you look at 24k bytes (or anything above 1k really):

    Pure PHP      - 4.565s (10 runs)
    PCRE          - 0.232s <-- WAT?!
    

    It turns out that the regular expression broke down after 1k characters and so the $matches array was empty. These are my .ini settings:

    pcre.backtrack_limit => 1000000 => 1000000
    pcre.recursion_limit => 100000 => 100000
    

    It’s not clear to me how a backtrack or recursion limit would have been hit after only 1k of characters. But even if those settings are “fixed” somehow, the results are still obvious, PCRE doesn’t seem to be the answer.

    I suppose writing this in C would speed it up somewhat, but I’m not sure to what degree.

    Update

    With some help from hakre’s answer I put together an improved version that increases performance by ~18% after optimizing the following:

    1. Remove the substr() calls in the outer loop to advance the string pointer; this was a left over from my previous recursive incarnations.

    2. Use the partial results as a positive cache to skip strpos() calls inside the inner loop.

    And here it is, in all its glory (:

    function find_repeating_sequences3($s)
    {
        $res = array(); 
        $p   = 0;
        $len = strlen($s);
    
        while ($p != $len) {
            $pat = $s[$p]; $i = ++$p;
            while ($i != $len) {
                if (!isset($res[$pat])) {
                    if (false === strpos($s, $pat, $i)) {
                        break;
                    }
                    $res[$pat] = 1;
                }
                // expand pattern and try again
                $pat .= $s[$i++];
            }
        }
        return array_keys($res);
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let's say I have the following classes that I want to construct using Ninject,
Let's say I have a string like this: var str = /abcd/efgh/ijkl/xxx-1/xxx-2; How do
Let's say I have some text as follows: do this, do that, then this,
Let's say I got this string: $str = alemylaife; (I know it's misspelled, all
Let's say that I have classes like this: public class Parent { public int
Let's say I have an Instant Messenger server using SignalR. I want to broadcast
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
Let's say I have this string: <div>john doe is nice guy btw 8240 E.
Let's say I don't have photoshop, but I want to make pattern files (.pat)
Let me explain best with an example. Say you have node class that can

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.