I have a text and using this simple regex to split it in words: [ \n]. It splits the text into words using spaces and line-breaks.
I want to know if there is a way to keep the whitespace or the line-break in the splited word, because I will use this to a simple sentence detection after some processing.
I’m using the String#split method.
You can use lookbehind as @Piotr Findeisen suggested (+1):
Output:
Short explanation:
?<=is look behind, meaning you got a match if the data before the expression you are looking for is equal to the regex coming after?<=(in this case[ \\n])[ \\n]is regex that means one of the characters in the[]so the whole regex says split every time that the character before the expression / word is either space or
\n.Since we didn’t try to match space or
\n, it will not remove them.