The View The difference between the two is that The…

Question

0

Asked: May 12, 20262026-05-12T06:23:02+00:00 2026-05-12T06:23:02+00:00

I have a few regular expressions which are run against very long strings. However,

0

I have a few regular expressions which are run against very long strings. However, the only part of the string which concerns the RE is near the beginning. Most of the REs are similar to:

\\s+?(\\w+?).*

The REs capture a few groups near the start, and don’t care what the rest of the string is. For performance reasons, is there a way to have the RE engine avoid looking at all the characters consumed by the terminating .*?

Note: The application with the REs is written using the java.regex classes.

Edit: For example I have the following RE:

.*?id="number"[^>]*?>([^<]+?).*

Which is run against large HTML files which are stored as StringBuilders. The tag with id="number" is always near the start of the HTML file.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T06:23:03+00:00

When using the java.util.regex classes, there are a number of ways to match against a given string. Matcher.matches always matches against the whole input string. Matcher.find looks for something matching your regular expression somewhere within the input string. Finally, Matcher.lookingAt matches your regular expression against the beginning of your input string.

If you are using Matcher.matches you may require the .* at the end to match the whole string. However, you might be better off using one of the other methods instead, which would allow you to leave off the .*. It sounds like Matcher.lookingAt may be appropriate for your purposes.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions