I have a string which is a fragment of a book (its around 1

Question

0

Asked: June 4, 20262026-06-04T03:13:49+00:00 2026-06-04T03:13:49+00:00

I have a string which is a fragment of a book (its around 1

0

I have a string which is a fragment of a book (its around 1 chapter)
this string is all one line.
I would like to make a new line at the end of each sentence

I solved it by a not-so-sophisticated code of

text = text.replaceAll("\\.","\\.\n"); //same for ? same for !

and of course this does not yield very nice results.
I dont need this to be perfect but the nicer i can get it the better.

I would like at least to check for following before making a new line character:

the word before the . is longer then 2 characters
there are no dots before the . in the same "word"
the character before the . is not a number
the character after the dot (and possibly a whitespace after that dot) is not a (

Any other suggestions would be really appreciated, along with actual code which will make it happen.

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T03:13:50+00:00

Stanford’s CoreNLP toolkit has a class that does sentence segmentation. See more here.

If you say new DocumentPreprocessor(new StringReader(s)).iterator() where s is a string containing the text, it will give you back an iterator of sentences.

Note that this will tokenize the sentence as well. If you want the sentence to look the way it started, you can either just use this output as a guide for splitting, or run the PTBTokenizer -untok command (see same link as above) to make each tokenized sentence look normal again.

This will almost certainly work better than your list of rules since your rules don’t account for many of the important cases.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a string which is a fragment of a book (its around 1

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply