My goal is to run through a database of strings and get the count each time each substring occurs. In other words I need to extract all possible word combinations from the string.
For example the input might be "this is the first string".
I would want to extract "this is", "is the", "the first", "first string", "this is the", "is the first", "the first string", "this is the first", "is the first string".
I only need to go left to right, always in order.
I am not really sure where to start with this. I already have the code to read database and save into list, just need to know how to extract all possible sub strings based on the space character.
The following method builds up a list of the indices of all spaces in your string (plus notional start and end spaces), then returns the substring between every ordered pair of indices:
Called as follows
it gives
Changing
minLengthto2will cut out the single-word returns.