I am trying to split raw text into sentences. So I simply use preg_split() function and split a raw text into sentence based on occurrence of ?, . and ;. But as expected I faced some problem due to some special case of . for example “Dr.”, “Mr.”, etc.
How can I exclude such word, or patter from spliting?
preg_split('/(\. )|(\? )|(\; )!(Mr\.)/', $content);
You can add negative lookbehind to the regex to make sure that the dot is not preceded by “Mr” and company:
I also simplified the regex a little bit. You should also consider substituting
\s|$(any whitespace or end of input) for the single space at the end of the current expression.See it in action.