lets say I have a string that I want to split based on several characters, like ".", "!", and "?". How do I figure out which one of those characters split my string so I can add that same character back on to the end of the split segments in question?
Dim linePunctuation as Integer = 0
Dim myString As String = "some text. with punctuation! in it?"
For i = 1 To Len(myString)
If Mid$(entireFile, i, 1) = "." Then linePunctuation += 1
Next
For i = 1 To Len(myString)
If Mid$(entireFile, i, 1) = "!" Then linePunctuation += 1
Next
For i = 1 To Len(myString)
If Mid$(entireFile, i, 1) = "?" Then linePunctuation += 1
Next
Dim delimiters(3) As Char
delimiters(0) = "."
delimiters(1) = "!"
delimiters(2) = "?"
currentLineSplit = myString.Split(delimiters)
Dim sentenceArray(linePunctuation) As String
Dim count As Integer = 0
While linePunctuation > 0
sentenceArray(count) = currentLineSplit(count)'Here I want to add what ever delimiter was used to make the split back onto the string before it is stored in the array.'
count += 1
linePunctuation -= 1
End While
If you add a capturing group to your regex like this:
Then the returned array contains both the text between the punctuation, and separate elements for each punctuation character. The
Split()function in .NET includes text matched by capturing groups in the returned array. If your regex has several capturing groups, all their matches are included in the array.This splits your sample into:
You can then iterate over the array to get your “sentences” and your punctuation.