I’m trying to create a regex containing character set which can contain a period

Question

0

Asked: May 20, 20262026-05-20T01:50:07+00:00 2026-05-20T01:50:07+00:00

I’m trying to create a regex containing character set which can contain a period

0

I’m trying to create a regex containing character set which can contain a period or colon but may not end with a period. So I want to mach a line saying "Lorem./: Ipsom dolor sit" but not "Lorem ipsum dolor sit."

This is what my current regex looks like, but it’s not working as it will match if the string ends on a period or colon:

/(\n{2,})([ \wåäöÅÄÖ,()%+\-:.]{2,75}[^.:])(\n{1,})/

I’m looking for headings in a huge, badly formatted plain text file. And the general pattern in this file is that a heading is always preceded by two newlines or more and always succeeded by one newline or more. Also a heading sometimes ends on a : but never on a . however they sometimes contain a . or :. Also they’re always 2-75 characters long and never preceded by another heading.

Any help would be greatly appreciated.

Edit: I realised that my explanation where quite bad and partly wrong thus updated this post.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T01:50:08+00:00

In general, if you want to match a string not ending in a dot, just add (?<!\.)$ to the end of the regex.

This is a negative lookbehind assertion.

In your special case, the match is supposed to continue after this, though, so we need a different approach:

/\n{2,}([ \wåäöÅÄÖ,()%+\-:.]{2,75}(?<!\.))\n+/

will match any line that

follows two or more newlines (\n{2,}),
consists only of 2 to 75 allowed characters ([ \wåäöÅÄÖ,()%+\-:.]),
doesn’t end in . ((?<!\.) – )
and is followed by at least one newline (\n+).

EDIT:

A new, expanded regex, trying to incorporate some of the logic discussed in the comments below; formatted as a verbose regex:

preg_match_all(
    '/(?<=\n\n)   # Assert that there are two newlines before the current position
    ^             # Assert that we\'re at the start of a line
    (?![\d -]+$)  # Assert that the line consists not solely of digits, spaces and -s
                  # Assert that the line doesn\'t consist of two Uppercase Words
    (?!\s*\p{Lu}\p{L}*\s+\p{Lu}\p{L}*\s*$)
                  # Match 2-75 of the allowed characters
    [ \wåäöÅÄÖ,()%+\-:.]{2,75}
    (?<!\.)       # Assert that the last one isn\'t a dot
    $             # Assert position at the end of a line
    (?=\n)        # Assert that one newline follows.
    /mxu', 
    $subject, $result, PREG_PATTERN_ORDER);

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to create a regex containing character set which can contain a period

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply