Right now I am seeing if a sentence contains a specific word by splitting the sentence into an array and then doing an include to see if it contains the word. Something like:
"This is my awesome sentence.".split(" ").include?('awesome')
But I’m wondering what the fastest way to do this with a phrase is. Like if I wanted to see if the sentence “This is my awesome sentence.” contains the phrase “my awesome sentence”. I am scraping sentences and comparing a very large number of phrases, so speed is somewhat important.
Here are some variations:
And the results for Ruby 1.9.3:
And 1.8.7:
So, from the results, using a fixed string search like
'foobar'['foo']is slower than using a regex'foobar'[/foo/], which slower than the equivalent'foobar' =~ /foo/.The OPs original solution suffers badly because it traverses the string twice: Once to split it into individual words, and a second time iterating the array looking for the actual target word. Its performance will degrade worse as the string size increases.
One thing I find interesting about the performance of Ruby, is that an anchored regex is slightly slower than unanchored regex. In Perl, the opposite was true when I first ran this sort of benchmark, several years ago.
Here’s an updated version using Fruity. The various expressions return different results. Any could be used if you want to see whether the target string exists. If you want to see whether the value is at the end of the string, like these are testing, or to get the location of the target, then some are definitely faster than others so pick accordingly.
Which results in:
Modifying the size of the string reveals good stuff to know.
Changing to 1,000 characters:
Bumping it to 10,000:
Ruby v2.6.5 results:
and:
“Best way to find a substring in a string” is related.