I'd strongly suggest you only check in safe connection strings,…

Question

0

Asked: May 13, 20262026-05-13T08:47:35+00:00 2026-05-13T08:47:35+00:00

I’m working on a JMD (Java MarkDown) (a Java port of MarkDownSharp ) but

0

I’m working on a JMD (Java MarkDown) (a Java port of MarkDownSharp) but I’m having an issue with one regex in particular. For the file Markdown_Documentation_Syntax.text this regular expression dies:

private static final String BLOCK_TAGS_1 = "p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del";
private static final String BLOCKS_NESTED_PATTERN = String.format("" +
        "(" +                      // save in $1
        "^" +                      // start of line (with MULTILINE)
        "<(%s)" +                  // start tag = $2
        "\\b" +                    // word break
        "(.*\\n)*?" +              // any number of lines, minimally matching
        "</\\2>" +                 // the matching end tag
        "[ \\t]*" +                // trailing spaces/tags
        "(?=\\n+|\\Z)" +           // followed by a newline or end of
        ")", BLOCK_TAGS_1);

which translates to:

(^<(p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del)\b(.*\n)*?</\2>[ \t]*(?=\n+|\Z))

This pattern is looking for accepted block tags that are anchored to the start of a line, followed by any number of lines and then are terminated by a matching tag followed by a newline or a string terminator. This generates:

java.lang.StackOverflowError
    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
    at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4357)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
    at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4357)
        ...

This can be dealt with by increasing the stack space for Java (defaults to 128k/400k for oss/ss IIRC) but the above expression is slow anyway.

So I’m looking for a regex guru who can do better (or at least explain the performance problem with this pattern). The C# version is a little slow but works fine. PHP seems to have no issues with this either.

Edit: This is on JDK6u17 running on Windows 7 64 Ultimate.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T08:47:36+00:00

This part:

(.*\n)*?

will involve A LOT of unnecessary backtracking because of the nested * and since there are chars that have to match afterwards.

I just ran a quick benchmark in perl on some arbitrary strings and got a 13-15% improvement just by switching that piece to

(?>.*\n)*?

which does non-capturing, independent subgrouping. That gives you two benefits, it no longer wastes time capturing the matching string, and more importantly, it no longer backtracks on the innermost .* which is a waste of time anyway. There’s no way that only a portion of that .* will ever result in a valid match so explicitly making it all or nothing should help.

Don’t know if that’s a sufficient improvement in this case, however.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions