I am building a natural language processor in C#, and many ‘words’ in our database are actually multiple-word phrases that refer to one noun or action. Please, no discussion on this design call, suffice it to say it is not changeable at this time. I have string arrays of related words (chunks) of the sentence that I need to test for these phrases and words. What is an appropriately idiomatic way to handle sub-array extraction so I run the least risk of overflow errors and the like?
To give an example of the desired logic, let me step through a run with a sample chunk. For our purposes, assume that the only multiple-word phrase from the database is ‘quick brown’.
Full phrase: The quick brown fox -> encoded as {"The", "quick", "brown", "fox"}
First iteration: Test "The quick brown fox" -> returns nothing
Second iteration: Test "The quick brown" -> returns nothing
Third iteration: Test "The quick" -> returns nothing
Fourth iteration: Test "The" -> returns value
Fifth iteration: Test "quick brown fox" -> returns nothing
Sixth iteration: Test "quick brown" -> returns value
Seventh iteration: Test "fox" -> returns value
Sum all returned values and return.
I have some ideas of how to go about this but the more I look at things the more I am really getting worried about array addressing errors and other such horrors plaguing my code. The phrase is coming in as a string array, but I’m fine with putting it to IEnumerable. My only concern there lies in an Enumerable’s lack of an index.
The path forward here lay in combining Mark’s and Philipp’s answers. Under ideal circumstances I would have edited one of their posts with it but it appears as though my edits were denied.
Anyway, I took the DelimitedArray that Mark linked and changed a few things in it:
Constructor changed to:
Index reference changed to:
I then worked that in to Philipp’s loop usage. This becomes:
If I could accept more than one answer I’d mark both of their answers but as both of them are incomplete I will have to accept my own when I am able to do so tomorrow.