I’m trying to create a regex containing character set which can contain a period or colon but may not end with a period. So I want to mach a line saying "Lorem./: Ipsom dolor sit" but not "Lorem ipsum dolor sit."
This is what my current regex looks like, but it’s not working as it will match if the string ends on a period or colon:
/(\n{2,})([ \wåäöÅÄÖ,()%+\-:.]{2,75}[^.:])(\n{1,})/
I’m looking for headings in a huge, badly formatted plain text file. And the general pattern in this file is that a heading is always preceded by two newlines or more and always succeeded by one newline or more. Also a heading sometimes ends on a : but never on a . however they sometimes contain a . or :. Also they’re always 2-75 characters long and never preceded by another heading.
Any help would be greatly appreciated.
Edit: I realised that my explanation where quite bad and partly wrong thus updated this post.
In general, if you want to match a string not ending in a dot, just add
(?<!\.)$to the end of the regex.This is a negative lookbehind assertion.
In your special case, the match is supposed to continue after this, though, so we need a different approach:
will match any line that
\n{2,}),[ \wåäöÅÄÖ,()%+\-:.]),.((?<!\.)– )\n+).EDIT:
A new, expanded regex, trying to incorporate some of the logic discussed in the comments below; formatted as a verbose regex: