Possible Duplicate:
What is a regular expression for parsing out individual sentences?
I want to split large text into sentence . The regex expression i got from answer here
string[] sentences = Regex.Split(mytext, @"(?<=[\.!\?])\s+");
So I thought of using a pattern to do splitting like
if a . ? ! follows a space and a capital letter than do the split.
Capital letter indicates starting of sentence .
text = " Sentence one . Sentence e.g. two ? Sentence three.
sentence[1] = Sentence one
sentence[2] = Sentence e.g. two
For problematic cases like abbreviations i intend to do replacing
mytext.replace("e.g.","eg");
How to implement this in regex ?
\p{Lt}indicates a Unicode uppercase letter (including accents etc.), soshould do what you want.
(Note that I don’t think
.or?need to be escaped in a character class so I’ve removed them too, but do check that this still works with those characters.)However, note that this will still split on e.g.
Mr. Jones…