Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6608333
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T19:35:55+00:00 2026-05-25T19:35:55+00:00

The following regex pattern, when applied to very long strings (60KB), causes java to

  • 0

The following regex pattern, when applied to very long strings (60KB), causes java to seem to “hang”.

.*\Q|\E.*\Q|\E.*\Q|\E.*bundle.*

I don’t understand why.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T19:35:56+00:00Added an answer on May 25, 2026 at 7:35 pm

    Basically, the “.*” (match any number of anything) means try to match the entire string, if it doesn’t match, then go back and try again, etc. using one of these is not too much of a problem, but the time necessary to use more than one increases exponentially. This is a fairly in-depth (and much-more accurate) discussion of this sort of thing: http://discovery.bmc.com/confluence/display/Configipedia/Writing+Efficient+Regex

    EDIT: (I hope you really wanted to know WHY)

    Example Source String:

    aaffg,  ";;p[p09978|ksjfsadfas|2936827634876|2345.4564a bundle of sticks
    

    ONE WAY OF LOOKING AT IT:

    The process takes so long because the .* matches the entire source string (aaffg, ";;p[p09978|ksjfsadfas|2936827634876|2345.4564a bundle of sticks), only to find that it does not end in a | symbol, then backtracks to the last case of a | symbol (...4876|2345...), then tries to match the next .* all the way to the end of the string.

    It starts looking for the next | symbol specified in your expression, and not finding it, it then backtracks to the first | symbol that was matched (the one in ...4876|2345...), discards that match and finds the closest | before it (...dfas|2936...), so that it will be able to match the second | symbol in your match expression.

    It will then proceed to match the .* to 2936827634876 and the second | to the one in ...4876|2345... and the next .* to the remaining text, only to find that you wanted yet another |. It will then continue to backtrack again and again, until it matches all of the symbols you specified.

    ANOTHER WAY OF LOOKING AT IT:

    (Original expression):

    .*\Q|\E.*\Q|\E.*\Q|\E.*bundle.*
    

    this roughly translates to

    match:
                   any number of anything, 
    followed by    a single '|', 
    followed by    any number of anything, 
    followed by    a single '|', 
    followed by    any number of anything, 
    followed by    a single '|', 
    followed by    any number of anything,
    followed by    the literal string 'bundle',
    followed by    any number of anything
    

    the problem is that any number of anything includes | symbols, requiring parsing of the entire string over and over again where what you really mean is any number of anything that is not a '|'

    To fix or improve the expression, I would recommend three things:

    First (and most significant), replace the majority of the “match anything”s (.*) with negated character classes ([^|]) like so:

    [^|]*\Q|\E[^|]*\Q|\E[^|]*\Q|\E.*bundle.*
    

    …this will prevent it from matching to the end of the string over and over again, but instead matching all the non-| symbols up to the first character that is not a “not a | symbol” (that double negative means up to the first | symbol), then matching the | symbol, then going to the next, etc…

    The second change (somewhat significant, depending upon your source string) should be making the second-to-last “match any number of anything” (.*) into a “lazy” or “reluctant” type of “any number of” (.*?). This will make it try to match anything with the idea of looking out for bundle instead of skipping over bundle and matching the rest of the string, only to realize that there is more to match once it gets there, having to backtrack. This would result in:

    [^|]*\Q|\E[^|]*\Q|\E[^|]*\Q|\E.*?bundle.*
    

    The third change I would recommend is for readability – replace the \Q\E blocks with a single escape, as in \|, like so:

    [^|]*\|[^|]*\|[^|]*\|[^|].*?bundle.*
    

    This is how the expression is internally processed anyways – there is literally a function that converts the expression to “escape all the special characters in between \Q and \E” – \Q\E is a shorthand only, and if it does not make your expression shorter or easier to read, it should not be used. Period.

    The negated character classes have an un-escaped | because | is not a special character within the context of character classes – but let’s not digress too much. You can escape them if you’d like, but you don’t have to.

    The final expression translates roughly to:

    match:
                   any number of anything that is not a '|', 
    followed by    a single '|', 
    followed by    any number of anything that is not a '|', 
    followed by    a single '|', 
    followed by    any number of anything that is not a '|', 
    followed by    a single '|', 
    followed by    any number of anything, up until the next expression can be matched,
    followed by    the literal string 'bundle',
    followed by    any number of anything
    

    A good tool that I use (but costs some money) is called RegexBuddy – a companion/free website for understanding regex’s is http://www.regular-expressions.info, and the particular page that explains repetition is http://www.regular-expressions.info/repeat.html

    RegexBuddy emulates other regex engines and says that your original regex would take 544 ‘steps’ to match as opposed to 35 ‘steps’ for the version I provided.

    SLIGHTLY LONGER Example Source String A:

    aaffg,  ";;p[p09978|ksjfsadfas|12936827634876|2345.4564a bundle of sticks
    

    SLIGHTLY LONGER Example Source String B:

    aaffg,  ";;p[p09978|ksjfsadfas|2936827634876|2345.4564a bundle of sticks4me
    

    Longer source string ‘A’ (added 1 before 2936827634876) did not affect my suggested replacement, but increased the original by 6 steps

    Longer source string ‘B’ (added ‘4me’ at the end of the expression) again did not affect my suggested replacement, but added 48 steps to the original

    Thus, depending on how a string is different from the examples above, a 60K string could only take 544 steps, or it could take more than a million steps

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've created the following regex pattern in an attempt to match a string 6
I need a regex for the following pattern: Total of 5 characters (alpha and
I have the following regex pattern, which strips out non, alpha-numerics. [^0-9A-Za-z] Works pretty
I have the following regex pattern: (.NET 1.1 Regex Validator) ^(?=.*[A-Za-z])[a-zA-Z0-9@\\-_\\+\\.]{6,32}$ I need to
I need a regex pattern which will accommodate for the following. I get a
Can someone help me to validate the following rules using a RegEx pattern Max
I tried with following regex, but it didn't work. myString.replaceAll(\, /); Exception: java.util.regex.PatternSyntaxException: Unexpected
Regex Pattern - ([^=](\\s*[\\w-.]*)*$) Test String - paginationInput.entriesPerPage=5 Java Regex Engine Crashing / Taking
i am using the following regex [^a-zA-Z\d!-] pattern in c# to clean special characters
In Python compiled regex patterns have a findall method that does the following: Return

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.