I am using SphinxQL to query Sphinxsearch engine. I want to simulate the SPH_MATCH_ANY which is implemented in the php API like this :
$cl->SetMatchMode(SPH_MATCH_ANY);
$cl->Query("test query", "index");
=> search for docs matching with “test” OR “query”
So, I have written a function (php) to replace spaces and other special chars with pipes (|) in order to use it with SphinxQL :
function formatQuery($str) {
return trim(preg_replace('/[^-_\'a-z0-9]+/', '|', $str), ' |');
}
$str = "test query";
$sql = "SELECT * FROM index WHERE MATCH('" . addslashes(formatQuery($str)) . "')";
=> SELECT * FROM index WHERE MATCH(‘test|query’);
The problem is, for some characters like – (minus), it can break the query, example :
$str = "i-phone is great";
$sql = "SELECT * FROM index WHERE MATCH('" . addslashes(formatQuery($str)) . "')";
=> SELECT * FROM index WHERE MATCH(‘i-phone|is|great’)
=> ok
$str = "i - phone is great";
$sql = "SELECT * FROM index WHERE MATCH('" . addslashes(formatQuery($str)) . "')";
=> SELECT * FROM index WHERE MATCH(‘i|-|phone|is|great’)
=> broken query because of “|-|”
Do you know a better way to make SphinxQL queries work in SPH_MATCH_ANY mode? or a better regexp to make it works for all cases?
I know I could use a more restrictive regexp like this:
preg_replace('/[^a-z0-9]+/', '|', $str)
but it would split strings like “i-phone is great” in ‘i|phone|is|great’ and I don’t want that…
Thank you,
Nico
One way might be to use quorom
you will need to add – to your charset_table tho, so it becomes part of a word.