I’m implementing a search in my website, and would like to support searching for exact phrases. I want to end up with an array of terms to search for; here are some examples:
"foobar \"your mom\" bar foo" => ["foobar", "your mom", "bar", "foo"]
"ruby rails'test course''test lesson'asdf" => ["ruby", "rails", "test course", "test lesson", "asdf"]
Notice that there doesn’t necessarily have to be a space before or after the quotes.
I’m not well versed in regular expressions, and it seems unnecessary to try to split it repeatedly on single characters. Can anybody help me out? Thanks.’
You want to use this regular expression (see on rubular.com):
This regex matches the tokens instead of the delimiters, so you’d want to use
scaninstead ofsplit.The
[…]construct is called a character class.[^"]is “anything but the double quote”.There are essentially 3 alternates:
"[^"]*"– double quoted token (may include spaces and single quotes)'[^']*'– single quoted token (may include spaces and double quotes)[^"'\s]+– a token consisting of one or more of anything but quotes and whitespacesReferences
Snippet
Here’s a Ruby implementation:
The above prints (as seen on ideone.com):
See also