I am writing a program in C# that compares strings similarly to the way that Google searches documents for keywords.
I am wanting a search for “stack overflow” to return true for “stack overflow” (plain), “This is the stack overflow.” (in the middle), “Welcome to Stack Overflow.” (case insensitive), “I like stack overflow.” (variable whitespace), and “Who puts a dash in stack-overflow?”, but not “stackoverflow” (no whitespace).
I was thinking that I could use a regular expression like “stack([ -]|. )+overflow”, it seems overkill to have to replace every space in each keyword with a character set for each new keyword. Because “stack overflow” is not the only string I am searching, I have to do it pragmatically.
To meet your specification, you could first do
(to transform your plain text search string into a regular expression that also allows punctuation in places where there used to be only whitespace), and then apply that regex to whatever text you’re searching.
But of course this will fail to match on the slightest typo whereas an algorithm using Levensthein distance will also match “Stak Overfloor”.